commit
This commit is contained in:
25
backend/.env.example
Normal file
25
backend/.env.example
Normal file
@@ -0,0 +1,25 @@
|
||||
# Python服务环境变量配置示例
|
||||
# 复制此文件为 .env 并根据需要修改
|
||||
|
||||
# ========== 运行环境 ==========
|
||||
# 可选值: dev, prod
|
||||
# 默认: dev
|
||||
ENV=dev
|
||||
|
||||
# ========== 可选:覆盖配置文件中的数据库配置 ==========
|
||||
# 如果设置了以下环境变量,将覆盖 config.{ENV}.yaml 中的对应配置
|
||||
# DB_HOST=localhost
|
||||
# DB_PORT=3306
|
||||
# DB_USER=root
|
||||
# DB_PASSWORD=your_password
|
||||
# DB_NAME=ai_wht
|
||||
|
||||
# ========== 可选:覆盖调度器配置 ==========
|
||||
# SCHEDULER_ENABLED=true
|
||||
# SCHEDULER_CRON=*/5 * * * * *
|
||||
# SCHEDULER_MAX_CONCURRENT=2
|
||||
# SCHEDULER_PUBLISH_TIMEOUT=300
|
||||
# SCHEDULER_MAX_ARTICLES_PER_USER_PER_RUN=2
|
||||
# SCHEDULER_MAX_FAILURES_PER_USER_PER_RUN=3
|
||||
# SCHEDULER_MAX_DAILY_ARTICLES_PER_USER=6
|
||||
# SCHEDULER_MAX_HOURLY_ARTICLES_PER_USER=2
|
||||
112
backend/CONFIG_GUIDE.md
Normal file
112
backend/CONFIG_GUIDE.md
Normal file
@@ -0,0 +1,112 @@
|
||||
# Python服务配置说明
|
||||
|
||||
## 配置文件结构
|
||||
|
||||
Python服务现在使用与Go服务相同的配置文件结构:
|
||||
|
||||
```
|
||||
backend/
|
||||
├── config.dev.yaml # 开发环境配置
|
||||
├── config.prod.yaml # 生产环境配置
|
||||
├── config.py # 配置加载模块
|
||||
├── .env.example # 环境变量示例
|
||||
└── .env # 环境变量(需手动创建,Git忽略)
|
||||
```
|
||||
|
||||
## 环境切换
|
||||
|
||||
通过设置 `ENV` 环境变量来切换环境:
|
||||
|
||||
### Windows (CMD)
|
||||
```bash
|
||||
set ENV=dev
|
||||
python main.py
|
||||
```
|
||||
|
||||
### Windows (PowerShell)
|
||||
```powershell
|
||||
$env:ENV="dev"
|
||||
python main.py
|
||||
```
|
||||
|
||||
### Linux/Mac
|
||||
```bash
|
||||
ENV=dev python main.py
|
||||
```
|
||||
|
||||
或者在 `.env` 文件中设置:
|
||||
```
|
||||
ENV=dev
|
||||
```
|
||||
|
||||
## 配置优先级
|
||||
|
||||
1. **环境变量** - 最高优先级
|
||||
2. **配置文件** - config.{ENV}.yaml
|
||||
3. **代码默认值** - 最低优先级
|
||||
|
||||
## 配置项说明
|
||||
|
||||
### 开发环境 (config.dev.yaml)
|
||||
|
||||
- **数据库**: 本地MySQL (localhost:3306)
|
||||
- **调度器**: 启用,每5秒执行一次(测试用)
|
||||
- **日志级别**: DEBUG
|
||||
|
||||
### 生产环境 (config.prod.yaml)
|
||||
|
||||
- **数据库**: 远程MySQL (8.149.233.36:3306)
|
||||
- **调度器**: 启用,每5分钟执行一次
|
||||
- **日志级别**: INFO
|
||||
|
||||
## 使用示例
|
||||
|
||||
### 1. 开发环境
|
||||
|
||||
创建 `.env` 文件:
|
||||
```bash
|
||||
ENV=dev
|
||||
```
|
||||
|
||||
启动服务:
|
||||
```bash
|
||||
python main.py
|
||||
```
|
||||
|
||||
### 2. 生产环境
|
||||
|
||||
创建 `.env` 文件:
|
||||
```bash
|
||||
ENV=prod
|
||||
```
|
||||
|
||||
启动服务:
|
||||
```bash
|
||||
python main.py
|
||||
```
|
||||
|
||||
### 3. 覆盖配置
|
||||
|
||||
如需临时修改某些配置,可在 `.env` 中添加:
|
||||
```bash
|
||||
ENV=dev
|
||||
DB_HOST=192.168.1.100
|
||||
SCHEDULER_CRON=0 */10 * * * *
|
||||
```
|
||||
|
||||
## 与Go服务的配置对应关系
|
||||
|
||||
| Python配置 | Go配置 | 说明 |
|
||||
|-----------|--------|------|
|
||||
| config.dev.yaml | config/config.dev.yaml | 开发环境配置 |
|
||||
| config.prod.yaml | config/config.prod.yaml | 生产环境配置 |
|
||||
| ENV环境变量 | ENV环境变量 | 环境切换 |
|
||||
| database.username | database.username | 数据库用户名 |
|
||||
| database.dbname | database.dbname | 数据库名称 |
|
||||
|
||||
## 注意事项
|
||||
|
||||
1. **密码安全**: 生产环境请修改 `config.prod.yaml` 中的数据库密码
|
||||
2. **Git忽略**: `.env` 文件已被Git忽略,不会提交到代码库
|
||||
3. **环境变量**: 环境变量会覆盖配置文件中的同名配置
|
||||
4. **调度器频率**: 开发环境默认5秒执行一次,生产环境默认5分钟执行一次
|
||||
266
backend/DAMAI_PROXY_GUIDE.md
Normal file
266
backend/DAMAI_PROXY_GUIDE.md
Normal file
@@ -0,0 +1,266 @@
|
||||
# 大麦固定代理IP使用指南
|
||||
|
||||
## 📋 概述
|
||||
|
||||
本项目已集成两个大麦固定代理IP,可用于无头浏览器访问,支持完整的HTTP认证。
|
||||
|
||||
## 🌐 代理配置
|
||||
|
||||
### 代理1
|
||||
- **服务器**: `36.137.177.131:50001`
|
||||
- **用户名**: `qqwvy0`
|
||||
- **密码**: `mun3r7xz`
|
||||
- **状态**: ✅ 已测试可用
|
||||
|
||||
### 代理2
|
||||
- **服务器**: `111.132.40.72:50002`
|
||||
- **用户名**: `ih3z07`
|
||||
- **密码**: `078bt7o5`
|
||||
- **状态**: ✅ 已测试可用
|
||||
|
||||
## 📂 相关文件
|
||||
|
||||
| 文件名 | 说明 |
|
||||
|--------|------|
|
||||
| `damai_proxy_config.py` | 代理配置管理模块 |
|
||||
| `test_damai_proxy.py` | 代理测试脚本 |
|
||||
| `example_use_damai_proxy.py` | 使用示例代码 |
|
||||
|
||||
## 🚀 快速开始
|
||||
|
||||
### 1. 测试代理可用性
|
||||
|
||||
```bash
|
||||
# 测试所有代理
|
||||
python test_damai_proxy.py
|
||||
|
||||
# 测试单个代理
|
||||
python test_damai_proxy.py 0 # 测试代理1
|
||||
python test_damai_proxy.py 1 # 测试代理2
|
||||
```
|
||||
|
||||
### 2. 在代码中使用
|
||||
|
||||
#### 方式一:使用配置模块
|
||||
|
||||
```python
|
||||
from damai_proxy_config import get_proxy_1, get_proxy_2, get_random_proxy
|
||||
|
||||
# 获取指定代理
|
||||
proxy = get_proxy_1() # 或 get_proxy_2()
|
||||
|
||||
# 随机获取代理
|
||||
proxy = get_random_proxy()
|
||||
|
||||
print(proxy)
|
||||
# 输出: {'server': 'http://...', 'username': '...', 'password': '...'}
|
||||
```
|
||||
|
||||
#### 方式二:在Playwright中使用
|
||||
|
||||
```python
|
||||
from playwright.async_api import async_playwright
|
||||
from damai_proxy_config import get_proxy_1
|
||||
|
||||
async def use_proxy():
|
||||
proxy_config = get_proxy_1()
|
||||
|
||||
playwright = await async_playwright().start()
|
||||
|
||||
# 配置代理(含认证)
|
||||
browser = await playwright.chromium.launch(
|
||||
headless=True,
|
||||
proxy={
|
||||
"server": proxy_config["server"],
|
||||
"username": proxy_config["username"],
|
||||
"password": proxy_config["password"]
|
||||
}
|
||||
)
|
||||
|
||||
context = await browser.new_context()
|
||||
page = await context.new_page()
|
||||
|
||||
# 访问目标网站
|
||||
await page.goto("https://www.damai.cn/")
|
||||
|
||||
await browser.close()
|
||||
await playwright.stop()
|
||||
```
|
||||
|
||||
#### 方式三:集成到browser_pool
|
||||
|
||||
```python
|
||||
from browser_pool import get_browser_pool
|
||||
from damai_proxy_config import get_random_proxy
|
||||
|
||||
async def use_with_pool():
|
||||
# 获取代理配置
|
||||
proxy = get_random_proxy()
|
||||
|
||||
# 注意:当前browser_pool需要修改以支持带认证的代理
|
||||
pool = get_browser_pool()
|
||||
browser, context, page = await pool.get_browser(
|
||||
proxy=f"{proxy['server']}" # 基础用法
|
||||
)
|
||||
```
|
||||
|
||||
## 🔧 API文档
|
||||
|
||||
### damai_proxy_config.py
|
||||
|
||||
#### `get_proxy_config(index: int) -> dict`
|
||||
获取指定索引的代理配置
|
||||
|
||||
**参数:**
|
||||
- `index`: 代理索引(0或1)
|
||||
|
||||
**返回:**
|
||||
```python
|
||||
{
|
||||
"server": "http://...",
|
||||
"username": "...",
|
||||
"password": "..."
|
||||
}
|
||||
```
|
||||
|
||||
#### `get_proxy_1() -> dict`
|
||||
快捷获取代理1配置
|
||||
|
||||
#### `get_proxy_2() -> dict`
|
||||
快捷获取代理2配置
|
||||
|
||||
#### `get_random_proxy() -> dict`
|
||||
随机获取一个可用代理
|
||||
|
||||
#### `get_all_enabled_proxies() -> list`
|
||||
获取所有已启用的代理列表
|
||||
|
||||
## ✅ 测试结果
|
||||
|
||||
所有代理已通过以下测试:
|
||||
|
||||
1. ✅ **IP检测测试** - 确认代理IP地址正确
|
||||
2. ✅ **小红书访问测试** - 成功访问小红书创作平台
|
||||
3. ✅ **大麦网访问测试** - 成功访问大麦网
|
||||
|
||||
### 测试日志示例
|
||||
|
||||
```
|
||||
🔍 开始测试: 大麦代理1
|
||||
代理服务器: http://36.137.177.131:50001
|
||||
认证信息: qqwvy0 / mun3r7xz
|
||||
============================================================
|
||||
✅ Playwright启动成功
|
||||
✅ 浏览器启动成功
|
||||
✅ 浏览器上下文创建成功
|
||||
✅ 页面创建成功
|
||||
|
||||
📍 测试1: 访问IP检测网站...
|
||||
✅ 访问成功
|
||||
🌐 当前IP信息:
|
||||
{
|
||||
"origin": "36.137.177.131"
|
||||
}
|
||||
|
||||
📍 测试2: 访问小红书登录页...
|
||||
✅ 访问成功
|
||||
页面标题: 小红书创作服务平台
|
||||
|
||||
📍 测试3: 访问大麦网...
|
||||
✅ 访问成功
|
||||
页面标题: 大麦网-全球演出赛事官方购票平台
|
||||
```
|
||||
|
||||
## 🎯 使用场景
|
||||
|
||||
1. **反爬虫绕过** - 使用固定IP避免频繁更换导致的风险
|
||||
2. **地域限制** - 使用特定地区的IP访问区域性内容
|
||||
3. **负载均衡** - 在多个代理间轮换,分散请求压力
|
||||
4. **容错处理** - 一个代理失败时自动切换到备用代理
|
||||
|
||||
## ⚠️ 注意事项
|
||||
|
||||
1. **认证信息安全**: 代理用户名密码已配置在代码中,生产环境建议使用环境变量
|
||||
2. **代理轮换**: 建议实现代理轮换机制,避免单一IP被封禁
|
||||
3. **异常处理**: 建议添加代理失败时的重试和切换逻辑
|
||||
4. **性能影响**: 使用代理会增加网络延迟,请根据实际需求权衡
|
||||
|
||||
## 🔄 代理管理
|
||||
|
||||
### 启用/禁用代理
|
||||
|
||||
编辑 `damai_proxy_config.py`,修改代理配置中的 `enabled` 字段:
|
||||
|
||||
```python
|
||||
DAMAI_PROXY_POOL = [
|
||||
{
|
||||
"name": "大麦代理1",
|
||||
"server": "http://36.137.177.131:50001",
|
||||
"username": "qqwvy0",
|
||||
"password": "mun3r7xz",
|
||||
"enabled": True # 设置为False禁用此代理
|
||||
},
|
||||
# ...
|
||||
]
|
||||
```
|
||||
|
||||
### 添加新代理
|
||||
|
||||
在 `DAMAI_PROXY_POOL` 列表中添加新的代理配置:
|
||||
|
||||
```python
|
||||
{
|
||||
"name": "新代理",
|
||||
"server": "http://ip:port",
|
||||
"username": "username",
|
||||
"password": "password",
|
||||
"enabled": True
|
||||
}
|
||||
```
|
||||
|
||||
## 📊 性能测试
|
||||
|
||||
根据测试结果,代理响应时间:
|
||||
- IP检测: ~2-3秒
|
||||
- 小红书: ~3-5秒
|
||||
- 大麦网: ~3-5秒
|
||||
|
||||
## 🛠️ 故障排查
|
||||
|
||||
### 问题1: 代理连接超时
|
||||
**解决方案**:
|
||||
1. 检查代理服务器是否在线
|
||||
2. 验证认证信息是否正确
|
||||
3. 增加连接超时时间
|
||||
|
||||
### 问题2: 认证失败
|
||||
**解决方案**:
|
||||
1. 确认用户名密码正确
|
||||
2. 检查代理是否需要IP白名单
|
||||
3. 联系代理服务商确认账户状态
|
||||
|
||||
### 问题3: 访问被拒绝
|
||||
**解决方案**:
|
||||
1. 切换到另一个代理
|
||||
2. 检查目标网站是否封禁了代理IP
|
||||
3. 添加适当的请求头和延迟
|
||||
|
||||
## 📝 更新日志
|
||||
|
||||
### 2025-12-26
|
||||
- ✅ 初始化大麦代理配置
|
||||
- ✅ 完成两个代理的测试验证
|
||||
- ✅ 创建配置管理模块
|
||||
- ✅ 添加使用示例和文档
|
||||
|
||||
## 📞 技术支持
|
||||
|
||||
如遇到代理相关问题,请检查:
|
||||
1. 网络连接是否正常
|
||||
2. 代理服务商是否有公告
|
||||
3. 代理配置是否正确
|
||||
|
||||
---
|
||||
|
||||
**最后更新**: 2025-12-26
|
||||
**版本**: 1.0.0
|
||||
108
backend/LOGIN_PAGE_CONFIG.md
Normal file
108
backend/LOGIN_PAGE_CONFIG.md
Normal file
@@ -0,0 +1,108 @@
|
||||
# 登录页面配置功能说明
|
||||
|
||||
## 功能概述
|
||||
|
||||
现在可以通过配置文件来控制小红书登录时获取Cookie的来源页面,支持两种选项:
|
||||
- **creator**: 创作者中心 (https://creator.xiaohongshu.com/login)
|
||||
- **home**: 小红书首页 (https://www.xiaohongshu.com)
|
||||
|
||||
## 配置方法
|
||||
|
||||
### 1. 修改配置文件
|
||||
|
||||
在 `config.dev.yaml` 或 `config.prod.yaml` 中找到 `login` 配置节:
|
||||
|
||||
```yaml
|
||||
# ========== 登录/绑定功能配置 ==========
|
||||
login:
|
||||
headless: false # 登录/绑定时的浏览器模式
|
||||
page: "creator" # 登录页面类型: creator 或 home
|
||||
```
|
||||
|
||||
将 `page` 的值修改为你想要的登录页面:
|
||||
- `"creator"`: 使用创作者中心登录页
|
||||
- `"home"`: 使用小红书首页登录
|
||||
|
||||
### 2. 重启服务
|
||||
|
||||
修改配置后需要重启Python后端服务使配置生效:
|
||||
|
||||
```bash
|
||||
# Windows
|
||||
cd backend
|
||||
.\start.bat
|
||||
|
||||
# Linux
|
||||
cd backend
|
||||
./start.sh
|
||||
```
|
||||
|
||||
## API参数覆盖
|
||||
|
||||
即使配置了默认值,API请求仍然可以通过 `login_page` 参数临时覆盖配置:
|
||||
|
||||
```javascript
|
||||
// 发送验证码
|
||||
POST /api/xhs/send-code
|
||||
{
|
||||
"phone": "13800138000",
|
||||
"country_code": "+86",
|
||||
"login_page": "home" // 可选,不传则使用配置文件默认值
|
||||
}
|
||||
|
||||
// 登录
|
||||
POST /api/xhs/login
|
||||
{
|
||||
"phone": "13800138000",
|
||||
"code": "123456",
|
||||
"country_code": "+86",
|
||||
"login_page": "home", // 可选,不传则使用配置文件默认值
|
||||
"session_id": "xxx"
|
||||
}
|
||||
```
|
||||
|
||||
## 优先级说明
|
||||
|
||||
1. **最高优先级**: API请求中的 `login_page` 参数
|
||||
2. **默认值**: 配置文件中的 `login.page` 配置
|
||||
3. **兜底值**: 如果都未配置,默认使用 `creator`
|
||||
|
||||
## 测试验证
|
||||
|
||||
运行测试脚本验证配置是否正确:
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
python test_login_page_config.py
|
||||
```
|
||||
|
||||
## 配置影响范围
|
||||
|
||||
修改 `login.page` 配置会影响以下功能:
|
||||
|
||||
1. **发送验证码接口** (`/api/xhs/send-code`)
|
||||
2. **登录接口** (`/api/xhs/login`)
|
||||
3. **浏览器池预热URL** (根据配置自动调整)
|
||||
|
||||
## 注意事项
|
||||
|
||||
1. 两个登录页面的HTML结构可能略有不同,如遇到问题请切换尝试
|
||||
2. 建议在开发环境先测试再应用到生产环境
|
||||
3. 配置修改后需要重启服务才能生效
|
||||
4. 如果API明确传入了 `login_page` 参数,会优先使用API参数而不是配置文件
|
||||
|
||||
## 示例场景
|
||||
|
||||
### 场景1:全局使用创作者中心
|
||||
```yaml
|
||||
login:
|
||||
page: "creator"
|
||||
```
|
||||
不传API参数时,所有请求都使用创作者中心登录。
|
||||
|
||||
### 场景2:全局使用首页,但个别请求使用创作者中心
|
||||
```yaml
|
||||
login:
|
||||
page: "home"
|
||||
```
|
||||
大部分请求使用首页,但特殊情况下API可以传 `"login_page": "creator"` 临时切换。
|
||||
203
backend/ali_sms_service.py
Normal file
203
backend/ali_sms_service.py
Normal file
@@ -0,0 +1,203 @@
|
||||
"""
|
||||
阿里云短信服务模块
|
||||
用于发送手机验证码
|
||||
"""
|
||||
import json
|
||||
import random
|
||||
import sys
|
||||
from typing import Dict, Any, Optional
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
from alibabacloud_dysmsapi20170525.client import Client as Dysmsapi20170525Client
|
||||
from alibabacloud_credentials.client import Client as CredentialClient
|
||||
from alibabacloud_credentials.models import Config as CredentialConfig
|
||||
from alibabacloud_tea_openapi import models as open_api_models
|
||||
from alibabacloud_dysmsapi20170525 import models as dysmsapi_20170525_models
|
||||
from alibabacloud_tea_util import models as util_models
|
||||
|
||||
|
||||
class AliSmsService:
|
||||
"""阿里云短信服务"""
|
||||
|
||||
def __init__(self, access_key_id: str, access_key_secret: str, sign_name: str, template_code: str):
|
||||
"""
|
||||
初始化阿里云短信服务
|
||||
|
||||
Args:
|
||||
access_key_id: 阿里云AccessKey ID
|
||||
access_key_secret: 阿里云AccessKey Secret
|
||||
sign_name: 短信签名
|
||||
template_code: 短信模板CODE
|
||||
"""
|
||||
self.sign_name = sign_name
|
||||
self.template_code = template_code
|
||||
|
||||
# 创建阿里云短信客户端
|
||||
credential_config = CredentialConfig(
|
||||
type='access_key',
|
||||
access_key_id=access_key_id,
|
||||
access_key_secret=access_key_secret
|
||||
)
|
||||
credential = CredentialClient(credential_config)
|
||||
config = open_api_models.Config(credential=credential)
|
||||
config.endpoint = 'dysmsapi.aliyuncs.com'
|
||||
|
||||
self.client = Dysmsapi20170525Client(config)
|
||||
|
||||
# 验证码缓存(简单内存存储,生产环境应使用Redis)
|
||||
self._code_cache: Dict[str, Dict[str, Any]] = {}
|
||||
|
||||
# 验证码配置
|
||||
self.code_length = 6 # 验证码长度
|
||||
self.code_expire_minutes = 5 # 验证码过期时间(分钟)
|
||||
|
||||
def _generate_code(self) -> str:
|
||||
"""生成随机验证码"""
|
||||
return ''.join([str(random.randint(0, 9)) for _ in range(self.code_length)])
|
||||
|
||||
async def send_verification_code(self, phone: str) -> Dict[str, Any]:
|
||||
"""
|
||||
发送验证码到指定手机号
|
||||
|
||||
Args:
|
||||
phone: 手机号
|
||||
|
||||
Returns:
|
||||
Dict containing success status and error message if any
|
||||
"""
|
||||
try:
|
||||
# 生成验证码
|
||||
code = self._generate_code()
|
||||
|
||||
print(f"[短信服务] 正在发送验证码到 {phone},验证码: {code}", file=sys.stderr)
|
||||
|
||||
# 构建短信请求
|
||||
send_sms_request = dysmsapi_20170525_models.SendSmsRequest(
|
||||
phone_numbers=phone,
|
||||
sign_name=self.sign_name,
|
||||
template_code=self.template_code,
|
||||
template_param=json.dumps({"code": code})
|
||||
)
|
||||
|
||||
runtime = util_models.RuntimeOptions()
|
||||
|
||||
# 发送短信
|
||||
try:
|
||||
resp = self.client.send_sms_with_options(send_sms_request, runtime)
|
||||
resp_dict = resp.to_map()
|
||||
|
||||
print(f"[短信服务] 阿里云响应: {json.dumps(resp_dict, default=str, indent=2, ensure_ascii=False)}", file=sys.stderr)
|
||||
|
||||
# 检查发送结果
|
||||
if resp_dict.get('body', {}).get('Code') == 'OK':
|
||||
# 缓存验证码
|
||||
self._code_cache[phone] = {
|
||||
'code': code,
|
||||
'expire_time': datetime.now() + timedelta(minutes=self.code_expire_minutes),
|
||||
'sent_at': datetime.now()
|
||||
}
|
||||
|
||||
print(f"[短信服务] 验证码发送成功,手机号: {phone}", file=sys.stderr)
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"message": f"验证码已发送,{self.code_expire_minutes}分钟内有效",
|
||||
"code": code # 开发环境返回验证码,生产环境应移除
|
||||
}
|
||||
else:
|
||||
error_msg = resp_dict.get('body', {}).get('Message', '未知错误')
|
||||
print(f"[短信服务] 发送失败: {error_msg}", file=sys.stderr)
|
||||
return {
|
||||
"success": False,
|
||||
"error": f"短信发送失败: {error_msg}"
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
error_msg = str(e)
|
||||
print(f"[短信服务] 发送异常: {error_msg}", file=sys.stderr)
|
||||
|
||||
# 如果有诊断地址,打印出来
|
||||
if hasattr(e, 'data') and e.data:
|
||||
recommend = e.data.get('Recommend')
|
||||
if recommend:
|
||||
print(f"[短信服务] 诊断地址: {recommend}", file=sys.stderr)
|
||||
|
||||
return {
|
||||
"success": False,
|
||||
"error": f"短信发送异常: {error_msg}"
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
print(f"[短信服务] 发送验证码失败: {str(e)}", file=sys.stderr)
|
||||
return {
|
||||
"success": False,
|
||||
"error": str(e)
|
||||
}
|
||||
|
||||
def verify_code(self, phone: str, code: str) -> Dict[str, Any]:
|
||||
"""
|
||||
验证手机号和验证码
|
||||
|
||||
Args:
|
||||
phone: 手机号
|
||||
code: 用户输入的验证码
|
||||
|
||||
Returns:
|
||||
Dict containing verification result
|
||||
"""
|
||||
try:
|
||||
# 检查验证码是否存在
|
||||
if phone not in self._code_cache:
|
||||
return {
|
||||
"success": False,
|
||||
"error": "验证码未发送或已过期,请重新获取"
|
||||
}
|
||||
|
||||
cached_data = self._code_cache[phone]
|
||||
|
||||
# 检查是否过期
|
||||
if datetime.now() > cached_data['expire_time']:
|
||||
# 删除过期验证码
|
||||
del self._code_cache[phone]
|
||||
return {
|
||||
"success": False,
|
||||
"error": "验证码已过期,请重新获取"
|
||||
}
|
||||
|
||||
# 验证码匹配
|
||||
if code == cached_data['code']:
|
||||
# 验证成功后删除验证码(一次性使用)
|
||||
del self._code_cache[phone]
|
||||
|
||||
print(f"[短信服务] 验证码验证成功,手机号: {phone}", file=sys.stderr)
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"message": "验证码验证成功"
|
||||
}
|
||||
else:
|
||||
return {
|
||||
"success": False,
|
||||
"error": "验证码错误,请重新输入"
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
print(f"[短信服务] 验证码验证失败: {str(e)}", file=sys.stderr)
|
||||
return {
|
||||
"success": False,
|
||||
"error": str(e)
|
||||
}
|
||||
|
||||
def cleanup_expired_codes(self):
|
||||
"""清理过期的验证码"""
|
||||
current_time = datetime.now()
|
||||
expired_phones = [
|
||||
phone for phone, data in self._code_cache.items()
|
||||
if current_time > data['expire_time']
|
||||
]
|
||||
|
||||
for phone in expired_phones:
|
||||
del self._code_cache[phone]
|
||||
|
||||
if expired_phones:
|
||||
print(f"[短信服务] 已清理 {len(expired_phones)} 个过期验证码", file=sys.stderr)
|
||||
553
backend/browser_pool.py
Normal file
553
backend/browser_pool.py
Normal file
@@ -0,0 +1,553 @@
|
||||
"""
|
||||
浏览器池管理模块
|
||||
管理Playwright浏览器实例的生命周期,支持复用以提升性能
|
||||
"""
|
||||
import asyncio
|
||||
import time
|
||||
from typing import Optional, Dict, Any
|
||||
from playwright.async_api import async_playwright, Browser, BrowserContext, Page
|
||||
import sys
|
||||
|
||||
|
||||
class BrowserPool:
|
||||
"""浏览器池管理器(单例模式)"""
|
||||
|
||||
def __init__(self, idle_timeout: int = 1800, max_instances: int = 5, headless: bool = True):
|
||||
"""
|
||||
初始化浏览器池
|
||||
|
||||
Args:
|
||||
idle_timeout: 空闲超时时间(秒),默认30分钟(已禁用,保持常驻)
|
||||
max_instances: 最大浏览器实例数,默认5个
|
||||
headless: 是否使用无头模式,False为有头模式(方便调试)
|
||||
"""
|
||||
self.playwright = None
|
||||
self.browser: Optional[Browser] = None
|
||||
self.context: Optional[BrowserContext] = None
|
||||
self.page: Optional[Page] = None
|
||||
self.last_used_time = 0
|
||||
self.idle_timeout = idle_timeout
|
||||
self.max_instances = max_instances
|
||||
self.headless = headless
|
||||
self.is_initializing = False
|
||||
self.init_lock = asyncio.Lock()
|
||||
self.is_preheated = False # 标记是否已预热
|
||||
|
||||
# 临时浏览器实例池(用于并发请求)
|
||||
self.temp_browsers: Dict[str, Dict] = {} # {session_id: {browser, context, page, created_at}}
|
||||
self.temp_lock = asyncio.Lock()
|
||||
|
||||
print(f"[浏览器池] 已创建,常驻模式(不自动清理),最大实例数: {max_instances}", file=sys.stderr)
|
||||
|
||||
async def get_browser(self, cookies: Optional[list] = None, proxy: Optional[str] = None,
|
||||
user_agent: Optional[str] = None, session_id: Optional[str] = None,
|
||||
headless: Optional[bool] = None) -> tuple[Browser, BrowserContext, Page]:
|
||||
"""
|
||||
获取浏览器实例(复用或新建)
|
||||
|
||||
Args:
|
||||
cookies: 可选的Cookie列表
|
||||
proxy: 可选的代理地址
|
||||
user_agent: 可选的自定义User-Agent
|
||||
session_id: 会话 ID,用于区分不同的并发请求
|
||||
headless: 可选的headless模式,为None时使用默认配置
|
||||
|
||||
Returns:
|
||||
(browser, context, page) 三元组
|
||||
"""
|
||||
# 如果没有指定headless,使用默认配置
|
||||
if headless is None:
|
||||
headless = self.headless
|
||||
# 如果主浏览器可用且无会话 ID,使用主浏览器
|
||||
if not session_id:
|
||||
async with self.init_lock:
|
||||
# 检查现有浏览器是否可用
|
||||
if await self._is_browser_alive():
|
||||
print("[浏览器池] 复用主浏览器实例", file=sys.stderr)
|
||||
self.last_used_time = time.time()
|
||||
|
||||
# 如果需要注入Cookie,直接添加到现有的context(不创建新context)
|
||||
if cookies:
|
||||
print(f"[浏览器池] 在现有context中注入 {len(cookies)} 个Cookie", file=sys.stderr)
|
||||
await self.context.add_cookies(cookies)
|
||||
|
||||
return self.browser, self.context, self.page
|
||||
else:
|
||||
# 创建新浏览器
|
||||
print("[浏览器池] 创建主浏览器实例", file=sys.stderr)
|
||||
await self._init_browser(cookies, proxy, user_agent)
|
||||
self.last_used_time = time.time()
|
||||
return self.browser, self.context, self.page
|
||||
|
||||
# 并发请求:复用或创建临时浏览器
|
||||
else:
|
||||
async with self.temp_lock:
|
||||
# 首先检查是否已存在该session_id的临时浏览器
|
||||
if session_id in self.temp_browsers:
|
||||
print(f"[浏览器池] 复用会话 {session_id} 的临时浏览器", file=sys.stderr)
|
||||
browser_info = self.temp_browsers[session_id]
|
||||
return browser_info["browser"], browser_info["context"], browser_info["page"]
|
||||
|
||||
# 检查是否超过最大实例数
|
||||
if len(self.temp_browsers) >= self.max_instances - 1: # -1 留给主浏览器
|
||||
print(f"[浏览器池] ⚠️ 已达最大实例数 ({self.max_instances}),等待释放...", file=sys.stderr)
|
||||
# TODO: 可以实现等待队列,这里直接报错
|
||||
raise Exception(f"浏览器实例数已满,请稍后再试")
|
||||
|
||||
print(f"[浏览器池] 为会话 {session_id} 创建临时浏览器 ({len(self.temp_browsers)+1}/{self.max_instances-1})", file=sys.stderr)
|
||||
|
||||
# 创建临时浏览器,传入headless参数
|
||||
browser, context, page = await self._create_temp_browser(cookies, proxy, user_agent, headless)
|
||||
|
||||
# 保存到临时池
|
||||
self.temp_browsers[session_id] = {
|
||||
"browser": browser,
|
||||
"context": context,
|
||||
"page": page,
|
||||
"created_at": time.time()
|
||||
}
|
||||
|
||||
return browser, context, page
|
||||
|
||||
async def _is_browser_alive(self) -> bool:
|
||||
"""检查浏览器是否存活(不检查超时,保持常驻)"""
|
||||
if not self.browser or not self.context or not self.page:
|
||||
return False
|
||||
|
||||
# 注意:为了保持浏览器常驻,不再检查空闲超时
|
||||
# 原代码:
|
||||
# if time.time() - self.last_used_time > self.idle_timeout:
|
||||
# print(f"[浏览器池] 浏览器空闲超时 ({self.idle_timeout}秒),需要重建", file=sys.stderr)
|
||||
# await self.close()
|
||||
# return False
|
||||
|
||||
# 检查浏览器是否仍在运行
|
||||
try:
|
||||
# 尝试获取页面标题来验证连接
|
||||
await self.page.title()
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f"[浏览器池] 浏览器连接失效: {str(e)}", file=sys.stderr)
|
||||
await self.close()
|
||||
return False
|
||||
|
||||
async def _init_browser(self, cookies: Optional[list] = None, proxy: Optional[str] = None,
|
||||
user_agent: Optional[str] = None):
|
||||
"""初始化新浏览器实例"""
|
||||
try:
|
||||
# 启动Playwright
|
||||
if not self.playwright:
|
||||
# Windows环境下,需要设置事件循环策略
|
||||
if sys.platform == 'win32':
|
||||
# 设置为ProactorEventLoop或SelectorEventLoop
|
||||
try:
|
||||
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
|
||||
except Exception as e:
|
||||
print(f"[浏览器池] 警告: 设置事件循环策略失败: {str(e)}", file=sys.stderr)
|
||||
|
||||
self.playwright = await async_playwright().start()
|
||||
print("[浏览器池] Playwright启动成功", file=sys.stderr)
|
||||
|
||||
# 启动浏览器(性能优先配置)
|
||||
launch_kwargs = {
|
||||
"headless": self.headless, # 使用配置的headless参数
|
||||
"args": [
|
||||
'--disable-blink-features=AutomationControlled', # 隐藏自动化特征
|
||||
'--no-sandbox', # Linux环境必需
|
||||
'--disable-setuid-sandbox',
|
||||
'--disable-dev-shm-usage', # 使用/tmp而非/dev/shm,避免内存不足
|
||||
|
||||
# 性能优化
|
||||
'--disable-web-security', # 禁用同源策略(提升加载速度)
|
||||
'--disable-features=IsolateOrigins,site-per-process', # 禁用站点隔离(提升性能)
|
||||
'--disable-site-isolation-trials',
|
||||
'--enable-features=NetworkService,NetworkServiceInProcess', # 网络服务优化
|
||||
'--disable-background-timer-throttling', # 禁用后台限速
|
||||
'--disable-backgrounding-occluded-windows',
|
||||
'--disable-renderer-backgrounding', # 渲染进程不降优先级
|
||||
'--disable-background-networking',
|
||||
|
||||
# 缓存和存储优化
|
||||
'--disk-cache-size=268435456', # 256MB磁盘缓存
|
||||
'--media-cache-size=134217728', # 128MB媒体缓存
|
||||
|
||||
# 渲染优化(保留GPU支持)
|
||||
'--enable-gpu-rasterization', # 启用GPU光栅化
|
||||
'--enable-zero-copy', # 零拷贝优化
|
||||
'--ignore-gpu-blocklist', # 忽略GPU黑名单
|
||||
'--enable-accelerated-2d-canvas', # 加速2D canvas
|
||||
|
||||
# 网络优化
|
||||
'--enable-quic', # 启用QUIC协议
|
||||
'--enable-tcp-fast-open', # TCP快速打开
|
||||
'--max-connections-per-host=10', # 每个主机最大连接数
|
||||
|
||||
# 减少不必要的功能
|
||||
'--disable-extensions',
|
||||
'--disable-breakpad', # 禁用崩溃报告
|
||||
'--disable-component-extensions-with-background-pages',
|
||||
'--disable-ipc-flooding-protection', # 禁用IPC洪水保护(提升性能)
|
||||
'--disable-hang-monitor', # 禁用挂起监控
|
||||
'--disable-prompt-on-repost',
|
||||
'--disable-domain-reliability',
|
||||
'--disable-component-update',
|
||||
|
||||
# 界面优化
|
||||
'--hide-scrollbars',
|
||||
'--mute-audio',
|
||||
'--no-first-run',
|
||||
'--no-default-browser-check',
|
||||
'--metrics-recording-only',
|
||||
'--force-color-profile=srgb',
|
||||
],
|
||||
}
|
||||
if proxy:
|
||||
launch_kwargs["proxy"] = {"server": proxy}
|
||||
|
||||
self.browser = await self.playwright.chromium.launch(**launch_kwargs)
|
||||
print("[浏览器池] Chromium浏览器启动成功", file=sys.stderr)
|
||||
|
||||
# 创建上下文
|
||||
await self._create_new_context(cookies, proxy, user_agent)
|
||||
|
||||
except Exception as e:
|
||||
print(f"[浏览器池] 初始化浏览器失败: {str(e)}", file=sys.stderr)
|
||||
await self.close()
|
||||
raise
|
||||
|
||||
async def _create_new_context(self, cookies: Optional[list] = None, proxy: Optional[str] = None,
|
||||
user_agent: Optional[str] = None):
|
||||
"""创建新的浏览器上下文"""
|
||||
try:
|
||||
# 关闭旧上下文
|
||||
if self.context:
|
||||
await self.context.close()
|
||||
print("[浏览器池] 已关闭旧上下文", file=sys.stderr)
|
||||
|
||||
# 创建新上下文
|
||||
context_kwargs = {
|
||||
"viewport": {'width': 1280, 'height': 720},
|
||||
"user_agent": user_agent or 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
|
||||
}
|
||||
self.context = await self.browser.new_context(**context_kwargs)
|
||||
|
||||
# 注入Cookie
|
||||
if cookies:
|
||||
await self.context.add_cookies(cookies)
|
||||
print(f"[浏览器池] 已注入 {len(cookies)} 个Cookie", file=sys.stderr)
|
||||
|
||||
# 创建页面
|
||||
self.page = await self.context.new_page()
|
||||
print("[浏览器池] 新页面创建成功", file=sys.stderr)
|
||||
|
||||
except Exception as e:
|
||||
print(f"[浏览器池] 创建上下文失败: {str(e)}", file=sys.stderr)
|
||||
raise
|
||||
|
||||
async def close(self):
|
||||
"""关闭浏览器池"""
|
||||
try:
|
||||
if self.page:
|
||||
await self.page.close()
|
||||
self.page = None
|
||||
if self.context:
|
||||
await self.context.close()
|
||||
self.context = None
|
||||
if self.browser:
|
||||
await self.browser.close()
|
||||
self.browser = None
|
||||
if self.playwright:
|
||||
await self.playwright.stop()
|
||||
self.playwright = None
|
||||
print("[浏览器池] 浏览器已关闭", file=sys.stderr)
|
||||
except Exception as e:
|
||||
print(f"[浏览器池] 关闭浏览器异常: {str(e)}", file=sys.stderr)
|
||||
|
||||
async def cleanup_if_idle(self):
|
||||
"""清理空闲浏览器(定时任务调用)- 已禁用,保持常驻"""
|
||||
# 注意:为了保持浏览器常驻,不再自动清理
|
||||
# 原代码:
|
||||
# if self.browser and time.time() - self.last_used_time > self.idle_timeout:
|
||||
# print(f"[浏览器池] 检测到空闲超时,自动清理浏览器", file=sys.stderr)
|
||||
# await self.close()
|
||||
pass # 不再执行清理操作
|
||||
|
||||
async def preheat(self, target_url: str = "https://creator.xiaohongshu.com/login"):
|
||||
"""
|
||||
预热浏览器:提前初始化并访问目标页面
|
||||
|
||||
Args:
|
||||
target_url: 预热目标页面,默认为小红书登录页
|
||||
"""
|
||||
try:
|
||||
print("[浏览器预热] 开始预热浏览器...", file=sys.stderr)
|
||||
|
||||
# 初始化浏览器
|
||||
await self._init_browser()
|
||||
self.last_used_time = time.time()
|
||||
|
||||
# 访问目标页面
|
||||
print(f"[浏览器预热] 正在访问: {target_url}", file=sys.stderr)
|
||||
await self.page.goto(target_url, wait_until='domcontentloaded', timeout=45000)
|
||||
|
||||
# 等待页面完全加载
|
||||
await asyncio.sleep(1)
|
||||
|
||||
self.is_preheated = True
|
||||
print("[浏览器预热] ✅ 预热完成,浏览器已就绪!", file=sys.stderr)
|
||||
print(f"[浏览器预热] 当前页面: {self.page.url}", file=sys.stderr)
|
||||
|
||||
except Exception as e:
|
||||
print(f"[浏览器预热] ⚠️ 预热失败: {str(e)}", file=sys.stderr)
|
||||
print("[浏览器预热] 将在首次使用时再初始化", file=sys.stderr)
|
||||
self.is_preheated = False
|
||||
|
||||
async def repreheat(self, target_url: str = "https://creator.xiaohongshu.com/login"):
|
||||
"""
|
||||
补充预热:在后台重新将浏览器预热到目标页面
|
||||
用于在主浏览器被使用后,重新预热以保证下次使用的性能
|
||||
|
||||
重要:如果浏览器正在使用中(有临时实例),跳过预热避免干扰
|
||||
|
||||
Args:
|
||||
target_url: 预热目标页面,默认为小红书登录页
|
||||
"""
|
||||
# 关键优化:检查是否有临时浏览器正在使用
|
||||
if len(self.temp_browsers) > 0:
|
||||
print(f"[浏览器补充预热] 检测到 {len(self.temp_browsers)} 个临时浏览器正在使用,跳过预热避免干扰", file=sys.stderr)
|
||||
return
|
||||
|
||||
# 检查主浏览器是否正在被使用(通过最近使用时间判断)
|
||||
time_since_last_use = time.time() - self.last_used_time
|
||||
if time_since_last_use < 10: # 最近10秒内使用过,可能还在操作中
|
||||
print(f"[浏览器补充预热] 主浏览器最近 {time_since_last_use:.1f}秒前被使用,可能还在操作中,跳过预热", file=sys.stderr)
|
||||
return
|
||||
|
||||
max_retries = 3
|
||||
retry_count = 0
|
||||
|
||||
while retry_count < max_retries:
|
||||
try:
|
||||
# 检查主浏览器是否存活
|
||||
if not await self._is_browser_alive():
|
||||
print(f"[浏览器补充预热] 浏览器未初始化,执行完整预热 (尝试 {retry_count + 1}/{max_retries})", file=sys.stderr)
|
||||
await self.preheat(target_url)
|
||||
self.is_preheated = True
|
||||
return
|
||||
|
||||
# 检查是否已经在目标页面
|
||||
current_url = self.page.url if self.page else ""
|
||||
if target_url in current_url:
|
||||
print(f"[浏览器补充预热] 已在目标页面,无需补充预热: {current_url}", file=sys.stderr)
|
||||
self.is_preheated = True
|
||||
return
|
||||
|
||||
print(f"[浏览器补充预热] 开始补充预热... (尝试 {retry_count + 1}/{max_retries})", file=sys.stderr)
|
||||
print(f"[浏览器补充预热] 当前页面: {current_url}", file=sys.stderr)
|
||||
|
||||
# 再次检查是否有新的临时浏览器(双重检查)
|
||||
if len(self.temp_browsers) > 0:
|
||||
print(f"[浏览器补充预热] 检测到新的临时浏览器启动,取消预热", file=sys.stderr)
|
||||
return
|
||||
|
||||
# 访问目标页面
|
||||
print(f"[浏览器补充预热] 正在访问: {target_url}", file=sys.stderr)
|
||||
await self.page.goto(target_url, wait_until='domcontentloaded', timeout=45000)
|
||||
|
||||
# 额外等待,确保页面完全加载
|
||||
await asyncio.sleep(2)
|
||||
|
||||
# 验证页面是否正确加载
|
||||
current_page_url = self.page.url
|
||||
if target_url in current_page_url or 'creator.xiaohongshu.com' in current_page_url:
|
||||
self.is_preheated = True
|
||||
self.last_used_time = time.time()
|
||||
print("[浏览器补充预热] ✅ 补充预热完成!", file=sys.stderr)
|
||||
print(f"[浏览器补充预热] 当前页面: {current_page_url}", file=sys.stderr)
|
||||
return # 成功,退出重试循环
|
||||
else:
|
||||
print(f"[浏览器补充预热] 页面未正确加载,期望: {target_url}, 实际: {current_page_url}", file=sys.stderr)
|
||||
raise Exception(f"页面未正确加载到目标地址")
|
||||
|
||||
except Exception as e:
|
||||
retry_count += 1
|
||||
print(f"[浏览器补充预热] ⚠️ 补充预热失败 (尝试 {retry_count}/{max_retries}): {str(e)}", file=sys.stderr)
|
||||
|
||||
if retry_count < max_retries:
|
||||
# 等待一段时间后重试
|
||||
await asyncio.sleep(2)
|
||||
# 尝试重新初始化浏览器
|
||||
try:
|
||||
await self.close() # 关闭当前可能有问题的浏览器
|
||||
except:
|
||||
pass # 忽略关闭时的错误
|
||||
else:
|
||||
# 所有重试都失败了
|
||||
print(f"[浏览器补充预热] ❌ 所有重试都失败了,将尝试完整预热", file=sys.stderr)
|
||||
try:
|
||||
await self.close() # 先关闭当前浏览器
|
||||
except:
|
||||
pass
|
||||
# 执行完整预热
|
||||
try:
|
||||
await self.preheat(target_url)
|
||||
self.is_preheated = True
|
||||
return
|
||||
except Exception as final_error:
|
||||
print(f"[浏览器补充预热] ❌ 最终预热也失败: {str(final_error)}", file=sys.stderr)
|
||||
self.is_preheated = False
|
||||
# 即使最终失败,也要确保浏览器处于可用状态
|
||||
try:
|
||||
await self._init_browser()
|
||||
except:
|
||||
pass
|
||||
|
||||
async def _create_temp_browser(self, cookies: Optional[list] = None, proxy: Optional[str] = None,
|
||||
user_agent: Optional[str] = None, headless: bool = True) -> tuple[Browser, BrowserContext, Page]:
|
||||
"""创建临时浏览器实例(用于并发请求)
|
||||
|
||||
Args:
|
||||
cookies: Cookie列表
|
||||
proxy: 代理地址
|
||||
user_agent: 自定义User-Agent
|
||||
headless: 是否使用无头模式
|
||||
"""
|
||||
try:
|
||||
# 启动Playwright(复用全局实例)
|
||||
if not self.playwright:
|
||||
if sys.platform == 'win32':
|
||||
try:
|
||||
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
|
||||
except Exception as e:
|
||||
print(f"[临时浏览器] 警告: 设置事件循环策略失败: {str(e)}", file=sys.stderr)
|
||||
|
||||
self.playwright = await async_playwright().start()
|
||||
|
||||
# 启动浏览器(临时实例,性能优先配置)
|
||||
launch_kwargs = {
|
||||
"headless": headless, # 使用传入的headless参数
|
||||
"args": [
|
||||
'--disable-blink-features=AutomationControlled',
|
||||
'--no-sandbox',
|
||||
'--disable-setuid-sandbox',
|
||||
'--disable-dev-shm-usage',
|
||||
|
||||
# 性能优化
|
||||
'--disable-web-security',
|
||||
'--disable-features=IsolateOrigins,site-per-process',
|
||||
'--disable-site-isolation-trials',
|
||||
'--enable-features=NetworkService,NetworkServiceInProcess',
|
||||
'--disable-background-timer-throttling',
|
||||
'--disable-backgrounding-occluded-windows',
|
||||
'--disable-renderer-backgrounding',
|
||||
'--disable-background-networking',
|
||||
|
||||
# 缓存优化
|
||||
'--disk-cache-size=268435456',
|
||||
'--media-cache-size=134217728',
|
||||
|
||||
# 渲染优化
|
||||
'--enable-gpu-rasterization',
|
||||
'--enable-zero-copy',
|
||||
'--ignore-gpu-blocklist',
|
||||
'--enable-accelerated-2d-canvas',
|
||||
|
||||
# 网络优化
|
||||
'--enable-quic',
|
||||
'--enable-tcp-fast-open',
|
||||
'--max-connections-per-host=10',
|
||||
|
||||
# 减少不必要的功能
|
||||
'--disable-extensions',
|
||||
'--disable-breakpad',
|
||||
'--disable-component-extensions-with-background-pages',
|
||||
'--disable-ipc-flooding-protection',
|
||||
'--disable-hang-monitor',
|
||||
'--disable-prompt-on-repost',
|
||||
'--disable-domain-reliability',
|
||||
'--disable-component-update',
|
||||
|
||||
# 界面优化
|
||||
'--hide-scrollbars',
|
||||
'--mute-audio',
|
||||
'--no-first-run',
|
||||
'--no-default-browser-check',
|
||||
'--metrics-recording-only',
|
||||
'--force-color-profile=srgb',
|
||||
],
|
||||
}
|
||||
if proxy:
|
||||
launch_kwargs["proxy"] = {"server": proxy}
|
||||
|
||||
browser = await self.playwright.chromium.launch(**launch_kwargs)
|
||||
|
||||
# 创建上下文
|
||||
context_kwargs = {
|
||||
"viewport": {'width': 1280, 'height': 720},
|
||||
"user_agent": user_agent or 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
|
||||
}
|
||||
context = await browser.new_context(**context_kwargs)
|
||||
|
||||
# 注入Cookie
|
||||
if cookies:
|
||||
await context.add_cookies(cookies)
|
||||
|
||||
# 创建页面
|
||||
page = await context.new_page()
|
||||
|
||||
return browser, context, page
|
||||
|
||||
except Exception as e:
|
||||
print(f"[临时浏览器] 创建失败: {str(e)}", file=sys.stderr)
|
||||
raise
|
||||
|
||||
async def release_temp_browser(self, session_id: str):
|
||||
"""释放临时浏览器"""
|
||||
async with self.temp_lock:
|
||||
if session_id in self.temp_browsers:
|
||||
browser_info = self.temp_browsers[session_id]
|
||||
try:
|
||||
await browser_info["page"].close()
|
||||
await browser_info["context"].close()
|
||||
await browser_info["browser"].close()
|
||||
print(f"[浏览器池] 已释放会话 {session_id} 的临时浏览器", file=sys.stderr)
|
||||
except Exception as e:
|
||||
print(f"[浏览器池] 释放临时浏览器异常: {str(e)}", file=sys.stderr)
|
||||
finally:
|
||||
del self.temp_browsers[session_id]
|
||||
|
||||
def get_stats(self) -> Dict[str, Any]:
|
||||
"""获取浏览器池统计信息"""
|
||||
return {
|
||||
"browser_alive": self.browser is not None,
|
||||
"context_alive": self.context is not None,
|
||||
"page_alive": self.page is not None,
|
||||
"is_preheated": self.is_preheated,
|
||||
"temp_browsers_count": len(self.temp_browsers),
|
||||
"max_instances": self.max_instances,
|
||||
"last_used_time": self.last_used_time,
|
||||
"idle_seconds": int(time.time() - self.last_used_time) if self.last_used_time > 0 else 0,
|
||||
"idle_timeout": self.idle_timeout
|
||||
}
|
||||
|
||||
|
||||
# 全局单例
|
||||
_browser_pool: Optional[BrowserPool] = None
|
||||
|
||||
|
||||
def get_browser_pool(idle_timeout: int = 1800, headless: bool = True) -> BrowserPool:
|
||||
"""获取全局浏览器池实例(单例)
|
||||
|
||||
Args:
|
||||
idle_timeout: 空闲超时时间(秒)
|
||||
headless: 是否使用无头模式,False为有头模式(方便调试)
|
||||
"""
|
||||
global _browser_pool
|
||||
if _browser_pool is None:
|
||||
print(f"[浏览器池] 创建单例,模式: {'headless' if headless else 'headed'}", file=sys.stderr)
|
||||
_browser_pool = BrowserPool(idle_timeout=idle_timeout, headless=headless)
|
||||
elif _browser_pool.headless != headless:
|
||||
# 如果headless配置变了,需要更新
|
||||
print(f"[浏览器池] 检测到headless配置变更: {_browser_pool.headless} -> {headless}", file=sys.stderr)
|
||||
_browser_pool.headless = headless
|
||||
return _browser_pool
|
||||
66
backend/config.dev.yaml
Normal file
66
backend/config.dev.yaml
Normal file
@@ -0,0 +1,66 @@
|
||||
# 小红书Python服务配置 - 开发环境
|
||||
|
||||
# ========== 服务配置 ==========
|
||||
server:
|
||||
host: "0.0.0.0"
|
||||
port: 8000
|
||||
debug: true
|
||||
reload: false # Windows环境不建议启用热重载
|
||||
|
||||
# ========== 数据库配置 ==========
|
||||
database:
|
||||
host: localhost
|
||||
port: 3306
|
||||
username: root
|
||||
password: JKjk20011115
|
||||
dbname: ai_wht
|
||||
charset: utf8mb4
|
||||
max_connections: 10
|
||||
min_connections: 2
|
||||
|
||||
# ========== 浏览器池配置 ==========
|
||||
browser_pool:
|
||||
idle_timeout: 1800 # 空闲超时(秒),已禁用自动清理,保持常驻
|
||||
max_instances: 5 # 最大浏览器实例数
|
||||
preheat_enabled: true # 是否启用预热
|
||||
preheat_url: "https://creator.xiaohongshu.com/login" # 预热URL(根据login.page自动调整)
|
||||
|
||||
# ========== 登录/绑定功能配置 ==========
|
||||
login:
|
||||
headless: false # 登录/绑定时的浏览器模式: false=有头模式(方便用户操作),true=无头模式
|
||||
page: "home" # 登录页面类型: creator=创作者中心(creator.xiaohongshu.com/login), home=小红书首页(www.xiaohongshu.com)
|
||||
|
||||
# ========== 定时发布调度器配置 ==========
|
||||
scheduler:
|
||||
enabled: true # 是否启用定时任务
|
||||
cron: "*/5 * * * * *" # Cron表达式(秒 分 时 日 月 周) - 每5秒执行一次(开发环境测试)
|
||||
max_concurrent: 2 # 最大并发发布数
|
||||
publish_timeout: 300 # 发布超时时间(秒)
|
||||
max_articles_per_user_per_run: 2 # 每轮每个用户最大发文数
|
||||
max_failures_per_user_per_run: 3 # 每轮每个用户最大失败次数(达到后暂停本轮后续发布)
|
||||
max_daily_articles_per_user: 6 # 每个用户每日最大发文数(自动发布)
|
||||
max_hourly_articles_per_user: 2 # 每个用户每小时最大发文数(自动发布)
|
||||
headless: false # 浏览器模式: false=有头模式(可调试),true=无头模式(生产环境)
|
||||
|
||||
# ========== 防封策略配置 ==========
|
||||
enable_random_ua: true # 启用随机User-Agent(防指纹识别)
|
||||
min_publish_interval: 30 # 最小发布间隔(秒),模拟真人行为
|
||||
max_publish_interval: 120 # 最大发布间隔(秒),模拟真人行为
|
||||
|
||||
# ========== 代理池配置 ==========
|
||||
proxy_pool:
|
||||
enabled: false # 默认关闭,按需开启
|
||||
api_url: "http://api.tianqiip.com/getip?secret=lu29e593&num=1&type=txt&port=1&mr=1&sign=4b81a62eaed89ba802a8f34053e2c964"
|
||||
|
||||
# ========== 阿里云短信配置 ==========
|
||||
ali_sms:
|
||||
access_key_id: "LTAI5tSMvnCJdqkZtCVWgh8R" # 从环境变量或配置文件读取
|
||||
access_key_secret: "nyFzXyIi47peVLK4wR2qqbPezmU79W" # 从环境变量或配置文件读取
|
||||
sign_name: "北京乐航时代科技" # 短信签名
|
||||
template_code: "SMS_486210104" # 短信模板CODE
|
||||
code_expire_minutes: 5 # 验证码有效期(分钟)
|
||||
|
||||
# ========== 日志配置 ==========
|
||||
logging:
|
||||
level: DEBUG
|
||||
format: "[%(asctime)s] [%(levelname)s] %(message)s"
|
||||
66
backend/config.prod.yaml
Normal file
66
backend/config.prod.yaml
Normal file
@@ -0,0 +1,66 @@
|
||||
# 小红书Python服务配置 - 生产环境
|
||||
|
||||
# ========== 服务配置 ==========
|
||||
server:
|
||||
host: "0.0.0.0"
|
||||
port: 8020
|
||||
debug: false
|
||||
reload: false
|
||||
|
||||
# ========== 数据库配置 ==========
|
||||
database:
|
||||
host: 8.149.233.36
|
||||
port: 3306
|
||||
username: ai_wht_write
|
||||
password: 7aK_H2yvokVumr84lLNDt8fDBp6P
|
||||
dbname: ai_wht
|
||||
charset: utf8mb4
|
||||
max_connections: 20
|
||||
min_connections: 5
|
||||
|
||||
# ========== 浏览器池配置 ==========
|
||||
browser_pool:
|
||||
idle_timeout: 1800 # 空闲超时(秒),已禁用自动清理,保持常驻
|
||||
max_instances: 10 # 最大浏览器实例数(生产环境可以更多)
|
||||
preheat_enabled: true # 是否启用预热
|
||||
preheat_url: "https://creator.xiaohongshu.com/login" # 预热URL(根据login.page自动调整)
|
||||
|
||||
# ========== 登录/绑定功能配置 ==========
|
||||
login:
|
||||
headless: true # 登录/绑定时的浏览器模式: false=有头模式(方便用户操作),true=无头模式
|
||||
page: "home" # 登录页面类型: creator=创作者中心(creator.xiaohongshu.com/login), home=小红书首页(www.xiaohongshu.com)
|
||||
|
||||
# ========== 定时发布调度器配置 ==========
|
||||
scheduler:
|
||||
enabled: true # 是否启用定时任务
|
||||
cron: "0 */5 * * * *" # Cron表达式(秒 分 时 日 月 周) - 每5分钟执行一次
|
||||
max_concurrent: 5 # 最大并发发布数
|
||||
publish_timeout: 300 # 发布超时时间(秒)
|
||||
max_articles_per_user_per_run: 5 # 每轮每个用户最大发文数
|
||||
max_failures_per_user_per_run: 3 # 每轮每个用户最大失败次数(达到后暂停本轮后续发布)
|
||||
max_daily_articles_per_user: 20 # 每个用户每日最大发文数(自动发布)
|
||||
max_hourly_articles_per_user: 3 # 每个用户每小时最大发文数(自动发布)
|
||||
headless: true # 浏览器模式: false=有头模式(可调试),true=无头模式(生产环境)
|
||||
|
||||
# ========== 防封策略配置 ==========
|
||||
enable_random_ua: true # 启用随机User-Agent(防指纹识别)
|
||||
min_publish_interval: 60 # 最小发布间隔(秒),生产环境建议60-300秒
|
||||
max_publish_interval: 300 # 最大发布间隔(秒),生产环境建议60-300秒
|
||||
|
||||
# ========== 代理池配置 ==========
|
||||
proxy_pool:
|
||||
enabled: false # 默认关闭,按需开启
|
||||
api_url: "http://api.tianqiip.com/getip?secret=lu29e593&num=1&type=txt&port=1&mr=1&sign=4b81a62eaed89ba802a8f34053e2c964"
|
||||
|
||||
# ========== 阿里云短信配置 ==========
|
||||
ali_sms:
|
||||
access_key_id: "LTAI5tSMvnCJdqkZtCVWgh8R" # 生产环境建议使用环境变量
|
||||
access_key_secret: "nyFzXyIi47peVLK4wR2qqbPezmU79W" # 生产环境建议使用环境变量
|
||||
sign_name: "北京乐航时代科技" # 短信签名
|
||||
template_code: "SMS_486210104" # 短信模板CODE
|
||||
code_expire_minutes: 5 # 验证码有效期(分钟)
|
||||
|
||||
# ========== 日志配置 ==========
|
||||
logging:
|
||||
level: INFO
|
||||
format: "[%(asctime)s] [%(levelname)s] %(message)s"
|
||||
146
backend/config.py
Normal file
146
backend/config.py
Normal file
@@ -0,0 +1,146 @@
|
||||
"""
|
||||
配置管理模块
|
||||
支持从YAML文件加载配置,支持环境变量覆盖
|
||||
"""
|
||||
import os
|
||||
import yaml
|
||||
from typing import Dict, Any
|
||||
|
||||
|
||||
class Config:
|
||||
"""配置类"""
|
||||
|
||||
def __init__(self, config_dict: Dict[str, Any]):
|
||||
self._config = config_dict
|
||||
|
||||
def get(self, key: str, default=None):
|
||||
"""获取配置值,支持点号分隔的嵌套键"""
|
||||
keys = key.split('.')
|
||||
value = self._config
|
||||
|
||||
for k in keys:
|
||||
if isinstance(value, dict):
|
||||
value = value.get(k)
|
||||
if value is None:
|
||||
return default
|
||||
else:
|
||||
return default
|
||||
|
||||
return value
|
||||
|
||||
def get_dict(self, key: str) -> Dict[str, Any]:
|
||||
"""获取配置字典"""
|
||||
value = self.get(key)
|
||||
return value if isinstance(value, dict) else {}
|
||||
|
||||
def get_int(self, key: str, default: int = 0) -> int:
|
||||
"""获取整数配置"""
|
||||
value = self.get(key, default)
|
||||
try:
|
||||
return int(value)
|
||||
except (ValueError, TypeError):
|
||||
return default
|
||||
|
||||
def get_bool(self, key: str, default: bool = False) -> bool:
|
||||
"""获取布尔配置"""
|
||||
value = self.get(key, default)
|
||||
if isinstance(value, bool):
|
||||
return value
|
||||
if isinstance(value, str):
|
||||
return value.lower() in ('true', 'yes', '1', 'on')
|
||||
return bool(value)
|
||||
|
||||
def get_str(self, key: str, default: str = '') -> str:
|
||||
"""获取字符串配置"""
|
||||
value = self.get(key, default)
|
||||
return str(value) if value is not None else default
|
||||
|
||||
|
||||
def load_config(env: str = None) -> Config:
|
||||
"""
|
||||
加载配置文件
|
||||
|
||||
Args:
|
||||
env: 环境名称,可选值: dev, prod
|
||||
如果不指定,从环境变量 ENV 读取,默认为 dev
|
||||
|
||||
Returns:
|
||||
Config对象
|
||||
"""
|
||||
# 确定环境
|
||||
if env is None:
|
||||
env = os.getenv('ENV', 'dev')
|
||||
|
||||
# 配置文件路径
|
||||
config_file = f'config.{env}.yaml'
|
||||
config_path = os.path.join(os.path.dirname(__file__), config_file)
|
||||
|
||||
if not os.path.exists(config_path):
|
||||
raise FileNotFoundError(f"配置文件不存在: {config_path}")
|
||||
|
||||
# 加载YAML配置
|
||||
with open(config_path, 'r', encoding='utf-8') as f:
|
||||
config_dict = yaml.safe_load(f)
|
||||
|
||||
# 环境变量覆盖(支持常用配置)
|
||||
# 数据库配置
|
||||
if os.getenv('DB_HOST'):
|
||||
config_dict.setdefault('database', {})['host'] = os.getenv('DB_HOST')
|
||||
if os.getenv('DB_PORT'):
|
||||
config_dict.setdefault('database', {})['port'] = int(os.getenv('DB_PORT'))
|
||||
if os.getenv('DB_USER'):
|
||||
config_dict.setdefault('database', {})['username'] = os.getenv('DB_USER')
|
||||
if os.getenv('DB_PASSWORD'):
|
||||
config_dict.setdefault('database', {})['password'] = os.getenv('DB_PASSWORD')
|
||||
if os.getenv('DB_NAME'):
|
||||
config_dict.setdefault('database', {})['dbname'] = os.getenv('DB_NAME')
|
||||
|
||||
# 调度器配置
|
||||
if os.getenv('SCHEDULER_ENABLED'):
|
||||
config_dict.setdefault('scheduler', {})['enabled'] = os.getenv('SCHEDULER_ENABLED').lower() == 'true'
|
||||
if os.getenv('SCHEDULER_CRON'):
|
||||
config_dict.setdefault('scheduler', {})['cron'] = os.getenv('SCHEDULER_CRON')
|
||||
if os.getenv('SCHEDULER_MAX_CONCURRENT'):
|
||||
config_dict.setdefault('scheduler', {})['max_concurrent'] = int(os.getenv('SCHEDULER_MAX_CONCURRENT'))
|
||||
if os.getenv('SCHEDULER_PUBLISH_TIMEOUT'):
|
||||
config_dict.setdefault('scheduler', {})['publish_timeout'] = int(os.getenv('SCHEDULER_PUBLISH_TIMEOUT'))
|
||||
if os.getenv('SCHEDULER_MAX_ARTICLES_PER_USER_PER_RUN'):
|
||||
config_dict.setdefault('scheduler', {})['max_articles_per_user_per_run'] = int(os.getenv('SCHEDULER_MAX_ARTICLES_PER_USER_PER_RUN'))
|
||||
if os.getenv('SCHEDULER_MAX_FAILURES_PER_USER_PER_RUN'):
|
||||
config_dict.setdefault('scheduler', {})['max_failures_per_user_per_run'] = int(os.getenv('SCHEDULER_MAX_FAILURES_PER_USER_PER_RUN'))
|
||||
if os.getenv('SCHEDULER_MAX_DAILY_ARTICLES_PER_USER'):
|
||||
config_dict.setdefault('scheduler', {})['max_daily_articles_per_user'] = int(os.getenv('SCHEDULER_MAX_DAILY_ARTICLES_PER_USER'))
|
||||
if os.getenv('SCHEDULER_MAX_HOURLY_ARTICLES_PER_USER'):
|
||||
config_dict.setdefault('scheduler', {})['max_hourly_articles_per_user'] = int(os.getenv('SCHEDULER_MAX_HOURLY_ARTICLES_PER_USER'))
|
||||
|
||||
# 代理池配置
|
||||
if os.getenv('PROXY_POOL_ENABLED'):
|
||||
config_dict.setdefault('proxy_pool', {})['enabled'] = os.getenv('PROXY_POOL_ENABLED').lower() == 'true'
|
||||
if os.getenv('PROXY_POOL_API_URL'):
|
||||
config_dict.setdefault('proxy_pool', {})['api_url'] = os.getenv('PROXY_POOL_API_URL')
|
||||
|
||||
print(f"[配置] 已加载配置文件: {config_file}")
|
||||
print(f"[配置] 环境: {env}")
|
||||
print(f"[配置] 数据库: {config_dict.get('database', {}).get('host')}:{config_dict.get('database', {}).get('port')}")
|
||||
print(f"[配置] 调度器: {'启用' if config_dict.get('scheduler', {}).get('enabled') else '禁用'}")
|
||||
|
||||
return Config(config_dict)
|
||||
|
||||
|
||||
# 全局配置对象
|
||||
app_config: Config = None
|
||||
|
||||
|
||||
def init_config(env: str = None):
|
||||
"""初始化全局配置"""
|
||||
global app_config
|
||||
app_config = load_config(env)
|
||||
return app_config
|
||||
|
||||
|
||||
def get_config() -> Config:
|
||||
"""获取全局配置对象"""
|
||||
global app_config
|
||||
if app_config is None:
|
||||
app_config = load_config()
|
||||
return app_config
|
||||
98
backend/damai_proxy_config.py
Normal file
98
backend/damai_proxy_config.py
Normal file
@@ -0,0 +1,98 @@
|
||||
"""
|
||||
大麦固定代理IP配置
|
||||
用于在无头浏览器中使用固定代理IP
|
||||
"""
|
||||
|
||||
# 大麦固定代理IP池
|
||||
DAMAI_PROXY_POOL = [
|
||||
{
|
||||
"name": "大麦代理1",
|
||||
"server": "http://36.137.177.131:50001",
|
||||
"username": "qqwvy0",
|
||||
"password": "mun3r7xz",
|
||||
"enabled": True
|
||||
},
|
||||
{
|
||||
"name": "大麦代理2",
|
||||
"server": "http://111.132.40.72:50002",
|
||||
"username": "ih3z07",
|
||||
"password": "078bt7o5",
|
||||
"enabled": True
|
||||
}
|
||||
]
|
||||
|
||||
|
||||
def get_proxy_config(index: int = 0) -> dict:
|
||||
"""
|
||||
获取指定索引的代理配置
|
||||
|
||||
Args:
|
||||
index: 代理索引(0或1)
|
||||
|
||||
Returns:
|
||||
代理配置字典,包含server、username、password
|
||||
"""
|
||||
if index < 0 or index >= len(DAMAI_PROXY_POOL):
|
||||
raise ValueError(f"代理索引无效: {index},有效范围: 0-{len(DAMAI_PROXY_POOL)-1}")
|
||||
|
||||
proxy = DAMAI_PROXY_POOL[index]
|
||||
if not proxy.get("enabled", True):
|
||||
raise ValueError(f"代理已禁用: {proxy['name']}")
|
||||
|
||||
return {
|
||||
"server": proxy["server"],
|
||||
"username": proxy["username"],
|
||||
"password": proxy["password"]
|
||||
}
|
||||
|
||||
|
||||
def get_all_enabled_proxies() -> list:
|
||||
"""
|
||||
获取所有已启用的代理配置
|
||||
|
||||
Returns:
|
||||
代理配置列表
|
||||
"""
|
||||
return [
|
||||
{
|
||||
"server": p["server"],
|
||||
"username": p["username"],
|
||||
"password": p["password"],
|
||||
"name": p["name"]
|
||||
}
|
||||
for p in DAMAI_PROXY_POOL
|
||||
if p.get("enabled", True)
|
||||
]
|
||||
|
||||
|
||||
def get_random_proxy() -> dict:
|
||||
"""
|
||||
随机获取一个可用的代理配置
|
||||
|
||||
Returns:
|
||||
代理配置字典
|
||||
"""
|
||||
import random
|
||||
enabled_proxies = [p for p in DAMAI_PROXY_POOL if p.get("enabled", True)]
|
||||
|
||||
if not enabled_proxies:
|
||||
raise ValueError("没有可用的代理")
|
||||
|
||||
proxy = random.choice(enabled_proxies)
|
||||
return {
|
||||
"server": proxy["server"],
|
||||
"username": proxy["username"],
|
||||
"password": proxy["password"],
|
||||
"name": proxy["name"]
|
||||
}
|
||||
|
||||
|
||||
# 快捷访问
|
||||
def get_proxy_1():
|
||||
"""获取代理1配置"""
|
||||
return get_proxy_config(0)
|
||||
|
||||
|
||||
def get_proxy_2():
|
||||
"""获取代理2配置"""
|
||||
return get_proxy_config(1)
|
||||
202
backend/debug_login_page.py
Normal file
202
backend/debug_login_page.py
Normal file
@@ -0,0 +1,202 @@
|
||||
"""
|
||||
小红书登录页面调试脚本
|
||||
用于调试登录页面结构和元素选择器
|
||||
"""
|
||||
import asyncio
|
||||
import sys
|
||||
from xhs_login import XHSLoginService
|
||||
|
||||
|
||||
async def debug_login_page(proxy_index: int = 0):
|
||||
"""
|
||||
调试登录页面,查看页面结构和可用元素
|
||||
"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"🔍 调试小红书登录页面")
|
||||
print(f"{'='*60}")
|
||||
|
||||
# 从代理配置获取代理信息
|
||||
from damai_proxy_config import get_proxy_config
|
||||
proxy_config = get_proxy_config(proxy_index)
|
||||
proxy_server = proxy_config['server'].replace('http://', '')
|
||||
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
|
||||
|
||||
print(f"✅ 使用代理: 代理{proxy_index + 1}")
|
||||
print(f" 代理服务器: {proxy_config['server']}")
|
||||
|
||||
# 创建登录服务
|
||||
login_service = XHSLoginService(use_pool=False) # 不使用池,便于调试
|
||||
|
||||
try:
|
||||
# 初始化浏览器(使用代理)
|
||||
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
||||
await login_service.init_browser(proxy=proxy_url, user_agent=user_agent)
|
||||
print("✅ 浏览器初始化成功(已启用代理)")
|
||||
|
||||
# 访问登录页面
|
||||
print(f"\n🌐 访问小红书创作者平台登录页...")
|
||||
await login_service.page.goto('https://creator.xiaohongshu.com/login', wait_until='networkidle', timeout=30000)
|
||||
await asyncio.sleep(5) # 等待更长时间让页面完全加载
|
||||
|
||||
# 获取页面标题和URL
|
||||
title = await login_service.page.title()
|
||||
url = login_service.page.url
|
||||
print(f"✅ 页面加载完成")
|
||||
print(f" 标题: {title}")
|
||||
print(f" URL: {url}")
|
||||
|
||||
# 获取页面内容
|
||||
content = await login_service.page.content()
|
||||
print(f" 页面内容长度: {len(content)} 字符")
|
||||
|
||||
# 查找所有input元素
|
||||
print(f"\n🔍 查找所有input元素...")
|
||||
inputs = await login_service.page.query_selector_all('input')
|
||||
print(f" 找到 {len(inputs)} 个input元素")
|
||||
|
||||
for i, inp in enumerate(inputs):
|
||||
try:
|
||||
placeholder = await inp.get_attribute('placeholder')
|
||||
input_type = await inp.get_attribute('type')
|
||||
name = await inp.get_attribute('name')
|
||||
class_name = await inp.get_attribute('class')
|
||||
id_attr = await inp.get_attribute('id')
|
||||
|
||||
print(f" Input {i+1}:")
|
||||
print(f" - placeholder: {placeholder}")
|
||||
print(f" - type: {input_type}")
|
||||
print(f" - name: {name}")
|
||||
print(f" - id: {id_attr}")
|
||||
print(f" - class: {class_name}")
|
||||
except Exception as e:
|
||||
print(f" Input {i+1}: 获取属性失败 - {str(e)}")
|
||||
|
||||
# 查找所有可能的手机号输入框选择器
|
||||
print(f"\n🔍 尝试常见手机号输入框选择器...")
|
||||
phone_selectors = [
|
||||
'input[placeholder="手机号"]',
|
||||
'input[placeholder*="手机"]',
|
||||
'input[type="tel"]',
|
||||
'input[type="text"][placeholder*="号"]',
|
||||
'input[placeholder*="Phone"]',
|
||||
'input[name*="phone"]',
|
||||
'input[placeholder*="号码"]',
|
||||
'input[placeholder*="mobile"]',
|
||||
'input[placeholder*="Mobile"]'
|
||||
]
|
||||
|
||||
found_inputs = []
|
||||
for selector in phone_selectors:
|
||||
try:
|
||||
element = await login_service.page.query_selector(selector)
|
||||
if element:
|
||||
found_inputs.append((selector, element))
|
||||
placeholder = await element.get_attribute('placeholder')
|
||||
print(f" ✅ 找到: {selector} (placeholder: {placeholder})")
|
||||
except Exception as e:
|
||||
print(f" ❌ 选择器 {selector} 失败: {str(e)}")
|
||||
|
||||
if not found_inputs:
|
||||
print(" ❌ 未找到任何手机号相关输入框")
|
||||
|
||||
# 查找所有按钮元素
|
||||
print(f"\n🔍 查找所有button元素...")
|
||||
buttons = await login_service.page.query_selector_all('button')
|
||||
print(f" 找到 {len(buttons)} 个button元素")
|
||||
|
||||
for i, btn in enumerate(buttons[:10]): # 只显示前10个
|
||||
try:
|
||||
text = await btn.inner_text()
|
||||
class_name = await btn.get_attribute('class')
|
||||
id_attr = await btn.get_attribute('id')
|
||||
|
||||
print(f" Button {i+1}:")
|
||||
print(f" - text: '{text.strip()}'")
|
||||
print(f" - class: {class_name}")
|
||||
print(f" - id: {id_attr}")
|
||||
except Exception as e:
|
||||
print(f" Button {i+1}: 获取信息失败 - {str(e)}")
|
||||
|
||||
# 查找发送验证码按钮
|
||||
print(f"\n🔍 尝试常见发送验证码按钮选择器...")
|
||||
code_selectors = [
|
||||
'text="发送验证码"',
|
||||
'text="获取验证码"',
|
||||
'text="发送"',
|
||||
'text="获取"',
|
||||
'button:has-text("验证码")',
|
||||
'button:has-text("发送")',
|
||||
'button:has-text("获取")',
|
||||
'[class*="send"]',
|
||||
'[class*="code"]',
|
||||
'[class*="verify"]'
|
||||
]
|
||||
|
||||
found_buttons = []
|
||||
for selector in code_selectors:
|
||||
try:
|
||||
element = await login_service.page.query_selector(selector)
|
||||
if element:
|
||||
found_buttons.append((selector, element))
|
||||
text = await element.inner_text()
|
||||
print(f" ✅ 找到: {selector} (text: '{text.strip()}')")
|
||||
except Exception as e:
|
||||
print(f" ❌ 选择器 {selector} 失败: {str(e)}")
|
||||
|
||||
if not found_buttons:
|
||||
print(" ❌ 未找到任何验证码相关按钮")
|
||||
|
||||
# 打印页面HTML片段(用于分析结构)
|
||||
print(f"\n📄 页面HTML片段(前1000字符)...")
|
||||
print(content[:1000])
|
||||
|
||||
print(f"\n📄 页面HTML片段(1000-2000字符)...")
|
||||
print(content[1000:2000])
|
||||
|
||||
# 等待用户交互(保持浏览器打开)
|
||||
print(f"\n⏸️ 浏览器保持打开状态,您可以手动检查页面")
|
||||
print(f" URL: {url}")
|
||||
print(f" 按 Ctrl+C 关闭浏览器...")
|
||||
|
||||
try:
|
||||
while True:
|
||||
await asyncio.sleep(1)
|
||||
except KeyboardInterrupt:
|
||||
print(f"\n⏹️ 用户中断,关闭浏览器...")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 调试过程异常: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
finally:
|
||||
await login_service.close_browser()
|
||||
|
||||
|
||||
async def main():
|
||||
"""主函数"""
|
||||
print("="*60)
|
||||
print("🔍 小红书登录页面调试工具")
|
||||
print("="*60)
|
||||
|
||||
print("\n此工具将帮助您分析小红书登录页面的结构")
|
||||
print("以便正确识别手机号输入框和验证码按钮")
|
||||
|
||||
proxy_choice = input("\n请选择代理 (0 或 1, 默认为0): ").strip()
|
||||
if proxy_choice not in ['0', '1']:
|
||||
proxy_choice = '0'
|
||||
proxy_idx = int(proxy_choice)
|
||||
|
||||
await debug_login_page(proxy_idx)
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print("✅ 调试完成!")
|
||||
print("="*60)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Windows环境下设置事件循环策略
|
||||
if sys.platform == 'win32':
|
||||
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
|
||||
|
||||
# 运行调试
|
||||
asyncio.run(main())
|
||||
BIN
backend/error_screenshot.png
Normal file
BIN
backend/error_screenshot.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 352 KiB |
146
backend/error_screenshot.py
Normal file
146
backend/error_screenshot.py
Normal file
@@ -0,0 +1,146 @@
|
||||
"""
|
||||
错误截图保存工具
|
||||
当发生错误时自动截图并保存,便于问题排查
|
||||
"""
|
||||
import os
|
||||
import sys
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
from playwright.async_api import Page
|
||||
|
||||
|
||||
# 截图保存目录
|
||||
SCREENSHOT_DIR = Path("error_screenshots")
|
||||
SCREENSHOT_DIR.mkdir(exist_ok=True)
|
||||
|
||||
|
||||
async def save_error_screenshot(
|
||||
page: Optional[Page],
|
||||
error_type: str,
|
||||
error_message: str = "",
|
||||
prefix: str = ""
|
||||
) -> Optional[str]:
|
||||
"""
|
||||
保存错误截图
|
||||
|
||||
Args:
|
||||
page: Playwright 页面对象
|
||||
error_type: 错误类型(如:login_failed, send_code_failed, publish_failed等)
|
||||
error_message: 错误信息(可选,会添加到日志)
|
||||
prefix: 文件名前缀(可选)
|
||||
|
||||
Returns:
|
||||
截图文件路径,失败返回None
|
||||
"""
|
||||
if not page:
|
||||
print("[错误截图] 页面对象为空,无法截图", file=sys.stderr)
|
||||
return None
|
||||
|
||||
try:
|
||||
# 生成文件名:年月日时分秒_错误类型.png
|
||||
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
|
||||
# 清理错误类型字符串(移除特殊字符)
|
||||
safe_error_type = "".join(c for c in error_type if c.isalnum() or c in ('_', '-'))
|
||||
|
||||
# 组合文件名
|
||||
if prefix:
|
||||
filename = f"{prefix}_{timestamp}_{safe_error_type}.png"
|
||||
else:
|
||||
filename = f"{timestamp}_{safe_error_type}.png"
|
||||
|
||||
filepath = SCREENSHOT_DIR / filename
|
||||
|
||||
# 截图
|
||||
await page.screenshot(path=str(filepath), full_page=True)
|
||||
|
||||
# 打印日志
|
||||
print(f"[错误截图] 已保存: {filepath}", file=sys.stderr)
|
||||
if error_message:
|
||||
print(f"[错误截图] 错误信息: {error_message}", file=sys.stderr)
|
||||
|
||||
# 返回文件路径
|
||||
return str(filepath)
|
||||
|
||||
except Exception as e:
|
||||
print(f"[错误截图] 截图失败: {str(e)}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
|
||||
def cleanup_old_screenshots(days: int = 7):
|
||||
"""
|
||||
清理旧的错误截图
|
||||
|
||||
Args:
|
||||
days: 保留最近几天的截图,默认7天
|
||||
"""
|
||||
try:
|
||||
import time
|
||||
current_time = time.time()
|
||||
cutoff_time = current_time - (days * 24 * 60 * 60)
|
||||
|
||||
deleted_count = 0
|
||||
for file in SCREENSHOT_DIR.glob("*.png"):
|
||||
if file.stat().st_mtime < cutoff_time:
|
||||
file.unlink()
|
||||
deleted_count += 1
|
||||
|
||||
if deleted_count > 0:
|
||||
print(f"[错误截图] 已清理 {deleted_count} 个超过 {days} 天的旧截图", file=sys.stderr)
|
||||
|
||||
except Exception as e:
|
||||
print(f"[错误截图] 清理旧截图失败: {str(e)}", file=sys.stderr)
|
||||
|
||||
|
||||
async def save_screenshot_with_html(
|
||||
page: Optional[Page],
|
||||
error_type: str,
|
||||
error_message: str = "",
|
||||
prefix: str = ""
|
||||
) -> tuple[Optional[str], Optional[str]]:
|
||||
"""
|
||||
保存错误截图和HTML源码(用于深度调试)
|
||||
|
||||
Args:
|
||||
page: Playwright 页面对象
|
||||
error_type: 错误类型
|
||||
error_message: 错误信息(可选)
|
||||
prefix: 文件名前缀(可选)
|
||||
|
||||
Returns:
|
||||
(截图路径, HTML路径),失败返回(None, None)
|
||||
"""
|
||||
if not page:
|
||||
return None, None
|
||||
|
||||
try:
|
||||
# 生成文件名
|
||||
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
|
||||
safe_error_type = "".join(c for c in error_type if c.isalnum() or c in ('_', '-'))
|
||||
|
||||
if prefix:
|
||||
base_filename = f"{prefix}_{timestamp}_{safe_error_type}"
|
||||
else:
|
||||
base_filename = f"{timestamp}_{safe_error_type}"
|
||||
|
||||
# 保存截图
|
||||
screenshot_path = SCREENSHOT_DIR / f"{base_filename}.png"
|
||||
await page.screenshot(path=str(screenshot_path), full_page=True)
|
||||
|
||||
# 保存HTML
|
||||
html_path = SCREENSHOT_DIR / f"{base_filename}.html"
|
||||
html_content = await page.content()
|
||||
with open(html_path, 'w', encoding='utf-8') as f:
|
||||
f.write(html_content)
|
||||
|
||||
print(f"[错误截图] 已保存截图: {screenshot_path}", file=sys.stderr)
|
||||
print(f"[错误截图] 已保存HTML: {html_path}", file=sys.stderr)
|
||||
if error_message:
|
||||
print(f"[错误截图] 错误信息: {error_message}", file=sys.stderr)
|
||||
|
||||
return str(screenshot_path), str(html_path)
|
||||
|
||||
except Exception as e:
|
||||
print(f"[错误截图] 保存截图和HTML失败: {str(e)}", file=sys.stderr)
|
||||
return None, None
|
||||
Binary file not shown.
|
After Width: | Height: | Size: 166 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 245 KiB |
200
backend/example_use_damai_proxy.py
Normal file
200
backend/example_use_damai_proxy.py
Normal file
@@ -0,0 +1,200 @@
|
||||
"""
|
||||
大麦固定代理使用示例
|
||||
演示如何在实际项目中使用固定代理IP
|
||||
"""
|
||||
import asyncio
|
||||
import sys
|
||||
from browser_pool import get_browser_pool
|
||||
from damai_proxy_config import get_proxy_1, get_proxy_2, get_random_proxy
|
||||
|
||||
|
||||
async def example1_use_specific_proxy():
|
||||
"""示例1: 使用指定的代理IP"""
|
||||
print("\n" + "="*60)
|
||||
print("示例1: 使用指定的代理IP(代理1)")
|
||||
print("="*60)
|
||||
|
||||
# 获取代理1的配置
|
||||
proxy_config = get_proxy_1()
|
||||
print(f"📌 使用代理: {proxy_config['server']}")
|
||||
|
||||
# 获取浏览器池
|
||||
pool = get_browser_pool()
|
||||
|
||||
try:
|
||||
# 获取浏览器实例(带代理)
|
||||
# 注意:需要修改browser_pool以支持带认证的代理
|
||||
browser, context, page = await pool.get_browser(
|
||||
proxy=proxy_config["server"]
|
||||
)
|
||||
|
||||
# 访问测试页面
|
||||
print("🌐 访问IP检测页面...")
|
||||
await page.goto("http://httpbin.org/ip", timeout=30000)
|
||||
|
||||
# 获取IP信息
|
||||
ip_info = await page.evaluate("() => document.body.innerText")
|
||||
print(f"✅ 当前IP:\n{ip_info}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 错误: {str(e)}")
|
||||
|
||||
|
||||
async def example2_use_random_proxy():
|
||||
"""示例2: 随机使用一个代理IP"""
|
||||
print("\n" + "="*60)
|
||||
print("示例2: 随机使用一个代理IP")
|
||||
print("="*60)
|
||||
|
||||
# 随机获取一个代理
|
||||
proxy_config = get_random_proxy()
|
||||
print(f"📌 随机选择代理: {proxy_config['name']}")
|
||||
print(f" 服务器: {proxy_config['server']}")
|
||||
|
||||
# 后续操作类似示例1
|
||||
print("✅ 代理配置已获取,可以用于浏览器实例化")
|
||||
|
||||
|
||||
async def example3_use_with_playwright_directly():
|
||||
"""示例3: 直接在Playwright中使用代理(带认证)"""
|
||||
print("\n" + "="*60)
|
||||
print("示例3: 直接在Playwright中使用代理(完整认证)")
|
||||
print("="*60)
|
||||
|
||||
from playwright.async_api import async_playwright
|
||||
|
||||
# 获取代理配置
|
||||
proxy_config = get_proxy_2()
|
||||
print(f"📌 使用代理2: {proxy_config['server']}")
|
||||
|
||||
playwright = None
|
||||
browser = None
|
||||
|
||||
try:
|
||||
# 启动Playwright
|
||||
playwright = await async_playwright().start()
|
||||
|
||||
# 配置代理(完整配置,包含认证信息)
|
||||
proxy_settings = {
|
||||
"server": proxy_config["server"],
|
||||
"username": proxy_config["username"],
|
||||
"password": proxy_config["password"]
|
||||
}
|
||||
|
||||
# 启动浏览器
|
||||
browser = await playwright.chromium.launch(
|
||||
headless=True,
|
||||
proxy=proxy_settings,
|
||||
args=['--disable-blink-features=AutomationControlled']
|
||||
)
|
||||
|
||||
# 创建上下文和页面
|
||||
context = await browser.new_context()
|
||||
page = await context.new_page()
|
||||
|
||||
# 访问测试页面
|
||||
print("🌐 访问大麦网...")
|
||||
await page.goto("https://www.damai.cn/", timeout=30000)
|
||||
|
||||
title = await page.title()
|
||||
print(f"✅ 页面标题: {title}")
|
||||
print(f" 当前URL: {page.url}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 错误: {str(e)}")
|
||||
|
||||
finally:
|
||||
if browser:
|
||||
await browser.close()
|
||||
if playwright:
|
||||
await playwright.stop()
|
||||
|
||||
|
||||
async def example4_switch_proxy_on_error():
|
||||
"""示例4: 代理失败时自动切换"""
|
||||
print("\n" + "="*60)
|
||||
print("示例4: 代理失败时自动切换到另一个代理")
|
||||
print("="*60)
|
||||
|
||||
from damai_proxy_config import get_all_enabled_proxies
|
||||
from playwright.async_api import async_playwright
|
||||
|
||||
proxies = get_all_enabled_proxies()
|
||||
print(f"📊 可用代理数: {len(proxies)}")
|
||||
|
||||
for i, proxy_config in enumerate(proxies):
|
||||
print(f"\n🔄 尝试代理 {i+1}/{len(proxies)}: {proxy_config['name']}")
|
||||
|
||||
playwright = None
|
||||
browser = None
|
||||
|
||||
try:
|
||||
# 启动Playwright
|
||||
playwright = await async_playwright().start()
|
||||
|
||||
# 配置代理
|
||||
proxy_settings = {
|
||||
"server": proxy_config["server"],
|
||||
"username": proxy_config["username"],
|
||||
"password": proxy_config["password"]
|
||||
}
|
||||
|
||||
# 启动浏览器
|
||||
browser = await playwright.chromium.launch(
|
||||
headless=True,
|
||||
proxy=proxy_settings
|
||||
)
|
||||
|
||||
context = await browser.new_context()
|
||||
page = await context.new_page()
|
||||
|
||||
# 测试访问
|
||||
await page.goto("http://httpbin.org/ip", timeout=15000)
|
||||
ip_info = await page.evaluate("() => document.body.innerText")
|
||||
|
||||
print(f"✅ {proxy_config['name']} 可用")
|
||||
print(f" IP信息: {ip_info.strip()}")
|
||||
|
||||
# 成功则退出循环
|
||||
await browser.close()
|
||||
await playwright.stop()
|
||||
break
|
||||
|
||||
except Exception as e:
|
||||
print(f"⚠️ {proxy_config['name']} 不可用: {str(e)}")
|
||||
if browser:
|
||||
await browser.close()
|
||||
if playwright:
|
||||
await playwright.stop()
|
||||
|
||||
# 如果是最后一个代理也失败,则报错
|
||||
if i == len(proxies) - 1:
|
||||
print("❌ 所有代理都不可用!")
|
||||
|
||||
|
||||
async def main():
|
||||
"""运行所有示例"""
|
||||
# Windows环境下设置事件循环策略
|
||||
if sys.platform == 'win32':
|
||||
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
|
||||
|
||||
print("\n" + "🎯"*30)
|
||||
print("大麦固定代理IP使用示例集")
|
||||
print("🎯"*30)
|
||||
|
||||
# 示例2: 随机代理
|
||||
await example2_use_random_proxy()
|
||||
|
||||
# 示例3: 完整的Playwright代理使用
|
||||
await example3_use_with_playwright_directly()
|
||||
|
||||
# 示例4: 代理容错切换
|
||||
await example4_switch_proxy_on_error()
|
||||
|
||||
print("\n" + "="*60)
|
||||
print("🎉 所有示例运行完成!")
|
||||
print("="*60)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
658
backend/main.py
658
backend/main.py
@@ -1,14 +1,32 @@
|
||||
# Windows兼容性:必须在任何异步操作之前设置事件循环策略
|
||||
import sys
|
||||
import asyncio
|
||||
import aiohttp
|
||||
import json
|
||||
if sys.platform == 'win32':
|
||||
# Windows下使用ProactorEventLoopPolicy来支持Playwright的子进程
|
||||
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
|
||||
print("[系统] Windows环境已设置ProactorEventLoopPolicy", file=sys.stderr)
|
||||
|
||||
# 加载配置
|
||||
from config import init_config, get_config
|
||||
from dotenv import load_dotenv
|
||||
load_dotenv() # 从 .env 文件加载环境变量(可选,用于覆盖配置文件)
|
||||
|
||||
from fastapi import FastAPI, HTTPException, File, UploadFile, Form
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from pydantic import BaseModel
|
||||
from typing import Optional, Dict, Any, List
|
||||
import asyncio
|
||||
from datetime import datetime
|
||||
import os
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
|
||||
from xhs_login import XHSLoginService
|
||||
from browser_pool import get_browser_pool
|
||||
from scheduler import XHSScheduler
|
||||
from error_screenshot import cleanup_old_screenshots
|
||||
from ali_sms_service import AliSmsService
|
||||
|
||||
app = FastAPI(title="小红书登录API")
|
||||
|
||||
@@ -21,8 +39,54 @@ app.add_middleware(
|
||||
allow_headers=["*"],
|
||||
)
|
||||
|
||||
# 全局登录服务实例
|
||||
login_service = XHSLoginService()
|
||||
# 全局登录服务实例(延迟初始化,避免在startup前创建浏览器池)
|
||||
login_service = None
|
||||
|
||||
# 全局浏览器池实例(在startup时初始化)
|
||||
browser_pool = None
|
||||
|
||||
# 全局调度器实例
|
||||
scheduler = None
|
||||
|
||||
# 全局阿里云短信服务实例
|
||||
sms_service = None
|
||||
|
||||
|
||||
async def fetch_proxy_from_pool() -> Optional[str]:
|
||||
"""从代理池接口获取一个代理地址(http://ip:port),获取失败返回None"""
|
||||
config = get_config()
|
||||
if not config.get_bool('proxy_pool.enabled', False):
|
||||
return None
|
||||
|
||||
api_url = config.get_str('proxy_pool.api_url', '')
|
||||
if not api_url:
|
||||
return None
|
||||
|
||||
try:
|
||||
timeout = aiohttp.ClientTimeout(total=10)
|
||||
async with aiohttp.ClientSession(timeout=timeout) as session:
|
||||
async with session.get(api_url) as resp:
|
||||
if resp.status != 200:
|
||||
print(f"[代理池] 接口返回非200状态码: {resp.status}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
text = (await resp.text()).strip()
|
||||
if not text:
|
||||
print("[代理池] 返回内容为空", file=sys.stderr)
|
||||
return None
|
||||
|
||||
line = text.splitlines()[0].strip()
|
||||
if not line:
|
||||
print("[代理池] 首行内容为空", file=sys.stderr)
|
||||
return None
|
||||
|
||||
if line.startswith("http://") or line.startswith("https://"):
|
||||
return line
|
||||
return "http://" + line
|
||||
except Exception as e:
|
||||
print(f"[代理池] 请求失败: {str(e)}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
|
||||
# 临时文件存储目录
|
||||
TEMP_DIR = Path("temp_uploads")
|
||||
@@ -32,11 +96,19 @@ TEMP_DIR.mkdir(exist_ok=True)
|
||||
class SendCodeRequest(BaseModel):
|
||||
phone: str
|
||||
country_code: str = "+86"
|
||||
login_page: Optional[str] = None # 登录页面:creator 或 home,为None时使用配置文件默认值
|
||||
|
||||
class VerifyCodeRequest(BaseModel):
|
||||
phone: str
|
||||
code: str
|
||||
country_code: str = "+86"
|
||||
|
||||
class LoginRequest(BaseModel):
|
||||
phone: str
|
||||
code: str
|
||||
country_code: str = "+86"
|
||||
login_page: Optional[str] = None # 登录页面:creator 或 home,为None时使用配置文件默认值
|
||||
session_id: Optional[str] = None # 可选:复用send-code接口的session_id
|
||||
|
||||
class PublishNoteRequest(BaseModel):
|
||||
title: str
|
||||
@@ -44,8 +116,20 @@ class PublishNoteRequest(BaseModel):
|
||||
images: Optional[list] = None
|
||||
topics: Optional[list] = None
|
||||
|
||||
class PublishWithCookiesRequest(BaseModel):
|
||||
cookies: Optional[list] = None # 兼容旧版,仅传Cookies
|
||||
login_state: Optional[dict] = None # 新版,传完整的login_state
|
||||
storage_state_path: Optional[str] = None # 新增:storage_state文件路径(最优先)
|
||||
phone: Optional[str] = None # 新增:手机号,用于查找storage_state文件
|
||||
title: str
|
||||
content: str
|
||||
images: Optional[list] = None
|
||||
topics: Optional[list] = None
|
||||
|
||||
class InjectCookiesRequest(BaseModel):
|
||||
cookies: list
|
||||
cookies: Optional[list] = None # 兼容旧版,仅传Cookies
|
||||
login_state: Optional[dict] = None # 新版,传完整的login_state
|
||||
target_page: Optional[str] = "creator" # 目标页面:creator 或 home
|
||||
|
||||
# 响应模型
|
||||
class BaseResponse(BaseModel):
|
||||
@@ -55,32 +139,241 @@ class BaseResponse(BaseModel):
|
||||
|
||||
@app.on_event("startup")
|
||||
async def startup_event():
|
||||
"""启动时不初始化浏览器,等待第一次请求时再初始化"""
|
||||
pass
|
||||
"""启动时启动后台清理任务和定时发布任务(已禁用预热)"""
|
||||
# 初始化配置(从ENV环境变量读取,默认dev)
|
||||
config = init_config()
|
||||
|
||||
print("[服务启动] FastAPI服务启动,浏览器池已就绪")
|
||||
|
||||
# 清理旧的错误截图(保留最近7天)
|
||||
try:
|
||||
cleanup_old_screenshots(days=7)
|
||||
except Exception as e:
|
||||
print(f"[启动] 清理旧截图失败: {str(e)}")
|
||||
|
||||
# 从配置文件读取headless参数
|
||||
headless = config.get_bool('scheduler.headless', True) # 定时发布的headless配置
|
||||
login_headless = config.get_bool('login.headless', False) # 登录/绑定的headless配置,默认为有头模式
|
||||
login_page = config.get_str('login.page', 'creator') # 登录页面类型,默认为创作者中心
|
||||
|
||||
# 根据配置自动调整预热URL
|
||||
if login_page == "home":
|
||||
preheat_url = "https://www.xiaohongshu.com"
|
||||
else:
|
||||
preheat_url = "https://creator.xiaohongshu.com/login"
|
||||
|
||||
# 初始化全局浏览器池(使用配置的headless参数)
|
||||
global browser_pool, login_service, sms_service
|
||||
browser_pool = get_browser_pool(idle_timeout=1800, headless=headless)
|
||||
print(f"[服务启动] 浏览器池模式: {'headless(无头模式)' if headless else 'headed(有头模式)'}")
|
||||
|
||||
# 初始化登录服务(使用独立的login.headless配置)
|
||||
login_service = XHSLoginService(use_pool=True, headless=login_headless)
|
||||
print(f"[服务启动] 登录服务模式: {'headless(无头模式)' if login_headless else 'headed(有头模式)'}")
|
||||
|
||||
# 初始化阿里云短信服务
|
||||
sms_dict = config.get_dict('ali_sms')
|
||||
sms_service = AliSmsService(
|
||||
access_key_id=sms_dict.get('access_key_id', ''),
|
||||
access_key_secret=sms_dict.get('access_key_secret', ''),
|
||||
sign_name=sms_dict.get('sign_name', ''),
|
||||
template_code=sms_dict.get('template_code', '')
|
||||
)
|
||||
print("[服务启动] 阿里云短信服务已初始化")
|
||||
|
||||
# 启动浏览器池清理任务
|
||||
asyncio.create_task(browser_cleanup_task())
|
||||
|
||||
# 已禁用预热功能,避免干扰正常业务流程
|
||||
# asyncio.create_task(browser_preheat_task())
|
||||
print("[服务启动] 浏览器预热功能已禁用")
|
||||
|
||||
# 启动定时发布任务
|
||||
global scheduler
|
||||
|
||||
# 从配置文件读取数据库配置
|
||||
db_dict = config.get_dict('database')
|
||||
db_config = {
|
||||
'host': db_dict.get('host', 'localhost'),
|
||||
'port': db_dict.get('port', 3306),
|
||||
'user': db_dict.get('username', 'root'),
|
||||
'password': db_dict.get('password', ''),
|
||||
'database': db_dict.get('dbname', 'ai_wht')
|
||||
}
|
||||
|
||||
# 从配置文件读取调度器配置
|
||||
scheduler_enabled = config.get_bool('scheduler.enabled', False)
|
||||
proxy_pool_enabled = config.get_bool('proxy_pool.enabled', False)
|
||||
proxy_pool_api_url = config.get_str('proxy_pool.api_url', '')
|
||||
enable_random_ua = config.get_bool('scheduler.enable_random_ua', True)
|
||||
min_publish_interval = config.get_int('scheduler.min_publish_interval', 30)
|
||||
max_publish_interval = config.get_int('scheduler.max_publish_interval', 120)
|
||||
# headless已经在上面读取了
|
||||
|
||||
if scheduler_enabled:
|
||||
scheduler = XHSScheduler(
|
||||
db_config=db_config,
|
||||
max_concurrent=config.get_int('scheduler.max_concurrent', 2),
|
||||
publish_timeout=config.get_int('scheduler.publish_timeout', 300),
|
||||
max_articles_per_user_per_run=config.get_int('scheduler.max_articles_per_user_per_run', 2),
|
||||
max_failures_per_user_per_run=config.get_int('scheduler.max_failures_per_user_per_run', 3),
|
||||
max_daily_articles_per_user=config.get_int('scheduler.max_daily_articles_per_user', 6),
|
||||
max_hourly_articles_per_user=config.get_int('scheduler.max_hourly_articles_per_user', 2),
|
||||
proxy_pool_enabled=proxy_pool_enabled,
|
||||
proxy_pool_api_url=proxy_pool_api_url,
|
||||
enable_random_ua=enable_random_ua,
|
||||
min_publish_interval=min_publish_interval,
|
||||
max_publish_interval=max_publish_interval,
|
||||
headless=headless, # 新增: 传递headless参数
|
||||
)
|
||||
|
||||
cron_expr = config.get_str('scheduler.cron', '*/5 * * * * *')
|
||||
scheduler.start(cron_expr)
|
||||
print(f"[服务启动] 定时发布任务已启动,Cron: {cron_expr}")
|
||||
else:
|
||||
print("[服务启动] 定时发布任务未启用")
|
||||
|
||||
async def browser_cleanup_task():
|
||||
"""后台任务:定期清理空闲浏览器"""
|
||||
while True:
|
||||
await asyncio.sleep(300) # 每5分钟检查一次
|
||||
try:
|
||||
await browser_pool.cleanup_if_idle()
|
||||
except Exception as e:
|
||||
print(f"[清理任务] 浏览器清理异常: {str(e)}")
|
||||
|
||||
async def browser_preheat_task():
|
||||
"""后台任务:预热浏览器"""
|
||||
try:
|
||||
# 延迟3秒启动,避免影响服务启动速度
|
||||
await asyncio.sleep(3)
|
||||
print("[预热任务] 开始预热浏览器...")
|
||||
await browser_pool.preheat("https://creator.xiaohongshu.com/login")
|
||||
except Exception as e:
|
||||
print(f"[预热任务] 预热失败: {str(e)}")
|
||||
|
||||
async def repreheat_browser_after_use():
|
||||
"""后台任务:使用后补充预热浏览器(仅用于登录流程)"""
|
||||
try:
|
||||
# 延迟5秒,确保:
|
||||
# 1. 响应已经返回给用户
|
||||
# 2. Cookie已经完全获取并保存
|
||||
# 3. 登录流程完全结束
|
||||
await asyncio.sleep(5)
|
||||
print("[补充预热任务] 开始补充预热浏览器...")
|
||||
await browser_pool.repreheat("https://creator.xiaohongshu.com/login")
|
||||
except Exception as e:
|
||||
print(f"[补充预热任务] 补充预热失败: {str(e)}")
|
||||
|
||||
@app.on_event("shutdown")
|
||||
async def shutdown_event():
|
||||
"""关闭时清理浏览器"""
|
||||
await login_service.close_browser()
|
||||
"""关闭时清理浏览器池和停止调度器"""
|
||||
print("[服务关闭] 正在关闭服务...")
|
||||
|
||||
# 停止调度器
|
||||
global scheduler
|
||||
if scheduler:
|
||||
scheduler.stop()
|
||||
print("[服务关闭] 调度器已停止")
|
||||
|
||||
# 关闭浏览器池
|
||||
await browser_pool.close()
|
||||
print("[服务关闭] 浏览器池已关闭")
|
||||
|
||||
@app.post("/api/xhs/send-code", response_model=BaseResponse)
|
||||
async def send_code(request: SendCodeRequest):
|
||||
"""
|
||||
发送验证码
|
||||
通过playwright访问小红书官网,输入手机号并触发验证码发送
|
||||
支持选择从创作者中心或小红书首页登录
|
||||
并发支持:为每个请求分配独立的浏览器实例
|
||||
"""
|
||||
# 使用手机号作为session_id,确保发送验证码和登录验证使用同一个浏览器
|
||||
session_id = f"xhs_login_{request.phone}"
|
||||
print(f"[发送验证码] session_id={session_id}, phone={request.phone}", file=sys.stderr)
|
||||
|
||||
# 获取配置中的默认login_page,如果API传入了则优先使用API参数
|
||||
config = get_config()
|
||||
default_login_page = config.get_str('login.page', 'creator')
|
||||
login_page = request.login_page if request.login_page else default_login_page
|
||||
|
||||
print(f"[发送验证码] 使用登录页面: {login_page} (配置默认={default_login_page}, API参数={request.login_page})", file=sys.stderr)
|
||||
|
||||
try:
|
||||
# 为此请求创建独立的登录服务实例,使用session_id实现并发隔离
|
||||
request_login_service = XHSLoginService(
|
||||
use_pool=True,
|
||||
headless=login_service.headless, # 使用配置文件中的login.headless配置
|
||||
session_id=session_id # 关键:传递session_id
|
||||
)
|
||||
|
||||
# 调用登录服务发送验证码
|
||||
result = await login_service.send_verification_code(
|
||||
result = await request_login_service.send_verification_code(
|
||||
phone=request.phone,
|
||||
country_code=request.country_code
|
||||
country_code=request.country_code,
|
||||
login_page=login_page # 传递登录页面参数
|
||||
)
|
||||
|
||||
if result["success"]:
|
||||
return BaseResponse(
|
||||
code=0,
|
||||
message="验证码已发送,请在小红书APP中查看",
|
||||
data={"sent_at": datetime.now().isoformat()}
|
||||
data={
|
||||
"sent_at": datetime.now().isoformat(),
|
||||
"session_id": session_id # 返回session_id供前端使用
|
||||
}
|
||||
)
|
||||
else:
|
||||
# 发送失败,释放临时浏览器
|
||||
if session_id and browser_pool:
|
||||
try:
|
||||
await browser_pool.release_temp_browser(session_id)
|
||||
print(f"[发送验证码] 已释放临时浏览器: {session_id}", file=sys.stderr)
|
||||
except Exception as e:
|
||||
print(f"[发送验证码] 释放临时浏览器失败: {str(e)}", file=sys.stderr)
|
||||
|
||||
return BaseResponse(
|
||||
code=1,
|
||||
message=result.get("error", "发送验证码失败"),
|
||||
data=None
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
print(f"发送验证码异常: {str(e)}", file=sys.stderr)
|
||||
|
||||
# 异常情况,释放临时浏览器
|
||||
if session_id and browser_pool:
|
||||
try:
|
||||
await browser_pool.release_temp_browser(session_id)
|
||||
print(f"[发送验证码] 已释放临时浏览器: {session_id}", file=sys.stderr)
|
||||
except Exception as release_error:
|
||||
print(f"[发送验证码] 释放临时浏览器失败: {str(release_error)}", file=sys.stderr)
|
||||
|
||||
return BaseResponse(
|
||||
code=1,
|
||||
message=f"发送验证码失败: {str(e)}",
|
||||
data=None
|
||||
)
|
||||
|
||||
@app.post("/api/xhs/phone/send-code", response_model=BaseResponse)
|
||||
async def send_phone_code(request: SendCodeRequest):
|
||||
"""
|
||||
发送手机短信验证码(使用阿里云短信服务)
|
||||
用于小红书手机号验证码登录
|
||||
"""
|
||||
try:
|
||||
# 调用阿里云短信服务发送验证码
|
||||
result = await sms_service.send_verification_code(request.phone)
|
||||
|
||||
if result["success"]:
|
||||
return BaseResponse(
|
||||
code=0,
|
||||
message=result.get("message", "验证码已发送"),
|
||||
data={
|
||||
"sent_at": datetime.now().isoformat(),
|
||||
# 开发环境返回验证码,生产环境应移除
|
||||
"code": result.get("code") if get_config().get_bool('server.debug', False) else None
|
||||
}
|
||||
)
|
||||
else:
|
||||
return BaseResponse(
|
||||
@@ -90,28 +383,104 @@ async def send_code(request: SendCodeRequest):
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
print(f"发送验证码异常: {str(e)}")
|
||||
print(f"发送短信验证码异常: {str(e)}")
|
||||
return BaseResponse(
|
||||
code=1,
|
||||
message=f"发送验证码失败: {str(e)}",
|
||||
data=None
|
||||
)
|
||||
|
||||
@app.post("/api/xhs/phone/verify-code", response_model=BaseResponse)
|
||||
async def verify_phone_code(request: VerifyCodeRequest):
|
||||
"""
|
||||
验证手机短信验证码
|
||||
用于小红书手机号验证码登录
|
||||
"""
|
||||
try:
|
||||
# 调用阿里云短信服务验证验证码
|
||||
result = sms_service.verify_code(request.phone, request.code)
|
||||
|
||||
if result["success"]:
|
||||
return BaseResponse(
|
||||
code=0,
|
||||
message="验证码验证成功",
|
||||
data={"verified_at": datetime.now().isoformat()}
|
||||
)
|
||||
else:
|
||||
return BaseResponse(
|
||||
code=1,
|
||||
message=result.get("error", "验证码验证失败"),
|
||||
data=None
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
print(f"验证验证码异常: {str(e)}")
|
||||
return BaseResponse(
|
||||
code=1,
|
||||
message=f"验证失败: {str(e)}",
|
||||
data=None
|
||||
)
|
||||
|
||||
@app.post("/api/xhs/login", response_model=BaseResponse)
|
||||
async def login(request: LoginRequest):
|
||||
"""
|
||||
登录验证
|
||||
用户填写验证码后,完成登录并获取小红书返回的数据
|
||||
支持选择从创作者中心或小红书首页登录
|
||||
并发支持:可复用send-code接口的session_id
|
||||
"""
|
||||
# 使用手机号作为session_id,复用发送验证码时的浏览器
|
||||
# 如果前端传了session_id就使用前端的,否则根据手机号生成
|
||||
if not request.session_id:
|
||||
session_id = f"xhs_login_{request.phone}"
|
||||
else:
|
||||
session_id = request.session_id
|
||||
|
||||
print(f"[登录验证] session_id={session_id}, phone={request.phone}", file=sys.stderr)
|
||||
|
||||
# 获取配置中的默认login_page,如果API传入了则优先使用API参数
|
||||
config = get_config()
|
||||
default_login_page = config.get_str('login.page', 'creator')
|
||||
login_page = request.login_page if request.login_page else default_login_page
|
||||
|
||||
print(f"[登录验证] 使用登录页面: {login_page} (配置默认={default_login_page}, API参数={request.login_page})", file=sys.stderr)
|
||||
|
||||
try:
|
||||
# 如果有session_id,复用send-code的浏览器;否则创建新的
|
||||
if session_id:
|
||||
print(f"[登录验证] 复用send-code的浏览器: {session_id}", file=sys.stderr)
|
||||
request_login_service = XHSLoginService(
|
||||
use_pool=True,
|
||||
headless=login_service.headless, # 使用配置文件中的login.headless配置
|
||||
session_id=session_id
|
||||
)
|
||||
# 初始化浏览器,以便从浏览器池获取临时浏览器
|
||||
await request_login_service.init_browser()
|
||||
else:
|
||||
# 旧逻辑:不传session_id,使用全局登录服务
|
||||
print(f"[登录验证] 使用全局登录服务(旧逻辑)", file=sys.stderr)
|
||||
request_login_service = login_service
|
||||
|
||||
# 调用登录服务进行登录
|
||||
result = await login_service.login(
|
||||
result = await request_login_service.login(
|
||||
phone=request.phone,
|
||||
code=request.code,
|
||||
country_code=request.country_code
|
||||
country_code=request.country_code,
|
||||
login_page=login_page # 传递登录页面参数
|
||||
)
|
||||
|
||||
# 释放临时浏览器(无论成功还是失败)
|
||||
if session_id and browser_pool:
|
||||
try:
|
||||
await browser_pool.release_temp_browser(session_id)
|
||||
print(f"[登录验证] 已释放临时浏览器: {session_id}", file=sys.stderr)
|
||||
except Exception as e:
|
||||
print(f"[登录验证] 释放临时浏览器失败: {str(e)}", file=sys.stderr)
|
||||
|
||||
if result["success"]:
|
||||
# 登录成功,不再触发预热(已禁用预热功能)
|
||||
# asyncio.create_task(repreheat_browser_after_use())
|
||||
|
||||
return BaseResponse(
|
||||
code=0,
|
||||
message="登录成功",
|
||||
@@ -119,10 +488,16 @@ async def login(request: LoginRequest):
|
||||
"user_info": result.get("user_info"),
|
||||
"cookies": result.get("cookies"), # 键值对格式(前端展示)
|
||||
"cookies_full": result.get("cookies_full"), # Playwright完整格式(数据库存储/脚本使用)
|
||||
"login_state": result.get("login_state"), # 完整登录状态(包含cookies + localStorage + sessionStorage)
|
||||
"localStorage": result.get("localStorage"), # localStorage数据
|
||||
"sessionStorage": result.get("sessionStorage"), # sessionStorage数据
|
||||
"url": result.get("url"), # 当前URL
|
||||
"storage_state_path": result.get("storage_state_path"), # storage_state文件路径
|
||||
"login_time": datetime.now().isoformat()
|
||||
}
|
||||
)
|
||||
else:
|
||||
# 登录失败
|
||||
return BaseResponse(
|
||||
code=1,
|
||||
message=result.get("error", "登录失败"),
|
||||
@@ -130,7 +505,16 @@ async def login(request: LoginRequest):
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
print(f"登录异常: {str(e)}")
|
||||
print(f"登录异常: {str(e)}", file=sys.stderr)
|
||||
|
||||
# 异常情况,释放临时浏览器
|
||||
if session_id and browser_pool:
|
||||
try:
|
||||
await browser_pool.release_temp_browser(session_id)
|
||||
print(f"[登录验证] 已释放临时浏览器: {session_id}", file=sys.stderr)
|
||||
except Exception as release_error:
|
||||
print(f"[登录验证] 释放临时浏览器失败: {str(release_error)}", file=sys.stderr)
|
||||
|
||||
return BaseResponse(
|
||||
code=1,
|
||||
message=f"登录失败: {str(e)}",
|
||||
@@ -140,31 +524,99 @@ async def login(request: LoginRequest):
|
||||
@app.get("/")
|
||||
async def root():
|
||||
"""健康检查"""
|
||||
return {"status": "ok", "message": "小红书登录服务运行中"}
|
||||
if browser_pool:
|
||||
stats = browser_pool.get_stats()
|
||||
return {
|
||||
"status": "ok",
|
||||
"message": "小红书登录服务运行中(浏览器池模式)",
|
||||
"browser_pool": stats
|
||||
}
|
||||
return {"status": "ok", "message": "服务初始化中..."}
|
||||
|
||||
@app.get("/api/health")
|
||||
async def health_check():
|
||||
"""健康检查接口(详细)"""
|
||||
if browser_pool:
|
||||
stats = browser_pool.get_stats()
|
||||
return {
|
||||
"status": "healthy",
|
||||
"service": "xhs-login-service",
|
||||
"mode": "browser-pool",
|
||||
"browser_pool_stats": stats,
|
||||
"timestamp": datetime.now().isoformat()
|
||||
}
|
||||
return {
|
||||
"status": "initializing",
|
||||
"service": "xhs-login-service",
|
||||
"timestamp": datetime.now().isoformat()
|
||||
}
|
||||
|
||||
@app.post("/api/xhs/inject-cookies", response_model=BaseResponse)
|
||||
async def inject_cookies(request: InjectCookiesRequest):
|
||||
"""
|
||||
注入Cookies并验证登录状态
|
||||
允许使用之前保存的Cookies跳过登录
|
||||
注入Cookies或完整登录状态并验证
|
||||
支持两种模式:
|
||||
1. 仅注入Cookies(兼容旧版)
|
||||
2. 注入完整login_state(包含Cookies + localStorage + sessionStorage)
|
||||
支持选择跳转到创作者中心或小红书首页
|
||||
|
||||
重要:为了避免检测,不使用浏览器池,每次创建全新的浏览器实例
|
||||
"""
|
||||
try:
|
||||
# 关闭旧的浏览器(如果有)
|
||||
if login_service.browser:
|
||||
await login_service.close_browser()
|
||||
|
||||
# 使用Cookies初始化浏览器
|
||||
await login_service.init_browser(cookies=request.cookies)
|
||||
# 创建一个独立的登录服务实例,不使用浏览器池
|
||||
print("✅ 为注入Cookie创建全新的浏览器实例,不使用浏览器池", file=sys.stderr)
|
||||
inject_service = XHSLoginService(use_pool=False, headless=False) # 不使用浏览器池,使用有头模式方便调试
|
||||
|
||||
# 验证登录状态
|
||||
result = await login_service.verify_login_status()
|
||||
# 优先使用login_state,其次使用cookies
|
||||
if request.login_state:
|
||||
# 新版:使用完整的login_state
|
||||
print("✅ 检测到login_state,将恢复完整登录状态", file=sys.stderr)
|
||||
|
||||
# 保存login_state到文件,供 init_browser 加载
|
||||
with open('login_state.json', 'w', encoding='utf-8') as f:
|
||||
json.dump(request.login_state, f, ensure_ascii=False, indent=2)
|
||||
|
||||
# 使用restore_state=True恢复完整状态
|
||||
await inject_service.init_browser(restore_state=True)
|
||||
|
||||
elif request.cookies:
|
||||
# 兼容旧版:仅使用Cookies
|
||||
print("⚠️ 检测到仅有Cookies,建议使用login_state获取更好的兼容性", file=sys.stderr)
|
||||
await inject_service.init_browser(cookies=request.cookies)
|
||||
|
||||
else:
|
||||
return BaseResponse(
|
||||
code=1,
|
||||
message="请提供 cookies 或 login_state",
|
||||
data=None
|
||||
)
|
||||
|
||||
# 根据target_page参数确定验证URL
|
||||
target_page = request.target_page or "creator"
|
||||
if target_page == "home":
|
||||
verify_url = "https://www.xiaohongshu.com"
|
||||
page_name = "小红书首页"
|
||||
else:
|
||||
verify_url = "https://creator.xiaohongshu.com"
|
||||
page_name = "创作者中心"
|
||||
|
||||
# 访问目标页面并验证登录状态
|
||||
result = await inject_service.verify_login_status(url=verify_url)
|
||||
|
||||
# 关闭独立的浏览器实例(注:因为不是池模式,会真正关闭)
|
||||
# await inject_service.close_browser() # 先不关闭,让用户看到结果
|
||||
|
||||
if result.get("logged_in"):
|
||||
return BaseResponse(
|
||||
code=0,
|
||||
message="Cookie注入成功,已登录",
|
||||
message=f"{'login_state' if request.login_state else 'Cookie'}注入成功,已跳转到{page_name}",
|
||||
data={
|
||||
"logged_in": True,
|
||||
"target_page": page_name,
|
||||
"user_info": result.get("user_info"),
|
||||
"cookies": result.get("cookies"), # 键值对格式
|
||||
"cookies_full": result.get("cookies_full"), # Playwright完整格式
|
||||
@@ -172,37 +624,117 @@ async def inject_cookies(request: InjectCookiesRequest):
|
||||
}
|
||||
)
|
||||
else:
|
||||
# 失败时关闭浏览器
|
||||
await inject_service.close_browser()
|
||||
|
||||
return BaseResponse(
|
||||
code=1,
|
||||
message=result.get("message", "Cookie已失效,请重新登录"),
|
||||
message=result.get("message", "{'login_state' if request.login_state else 'Cookie'}已失效,请重新登录"),
|
||||
data={
|
||||
"logged_in": False
|
||||
}
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
print(f"注入Cookies异常: {str(e)}")
|
||||
print(f"注入失败: {str(e)}", file=sys.stderr)
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return BaseResponse(
|
||||
code=1,
|
||||
message=f"注入失败: {str(e)}",
|
||||
data=None
|
||||
)
|
||||
|
||||
@app.post("/api/xhs/publish", response_model=BaseResponse)
|
||||
async def publish_note(request: PublishNoteRequest):
|
||||
@app.post("/api/xhs/publish-with-cookies", response_model=BaseResponse)
|
||||
async def publish_note_with_cookies(request: PublishWithCookiesRequest):
|
||||
"""
|
||||
发布笔记
|
||||
登录后可以发布图文笔记到小红书
|
||||
使用Cookies或完整login_state或storage_state发布笔记(供Go后端定时任务调用)
|
||||
支持三种模式(按优先级):
|
||||
1. 使用storage_state_path(推荐,最完整的登录状态)
|
||||
2. 传入完整login_state(次选,包含cookies + localStorage + sessionStorage)
|
||||
3. 仅传入Cookies(兼容旧版)
|
||||
|
||||
重要:为了避免检测,不使用浏览器池,每次创建全新的浏览器实例
|
||||
"""
|
||||
try:
|
||||
# 调用登录服务发布笔记
|
||||
result = await login_service.publish_note(
|
||||
# 获取代理(如果启用)
|
||||
proxy = await fetch_proxy_from_pool()
|
||||
if proxy:
|
||||
print(f"[发布接口] 使用代理: {proxy}", file=sys.stderr)
|
||||
|
||||
# 创建一个独立的登录服务实例,不使用浏览器池,应用所有反检测措施
|
||||
print("✅ 为发布任务创建全新的浏览器实例,不使用浏览器池", file=sys.stderr)
|
||||
|
||||
# 从配置读取headless参数
|
||||
config = get_config()
|
||||
headless = config.get_bool('scheduler.headless', True)
|
||||
|
||||
publish_service = XHSLoginService(use_pool=False, headless=headless) # 不使用浏览器池
|
||||
|
||||
# 优先级判断:storage_state_path > login_state > cookies
|
||||
if request.storage_state_path or request.phone:
|
||||
# 模式1:使用storage_state(最优先)
|
||||
storage_state_file = None
|
||||
|
||||
if request.storage_state_path:
|
||||
# 直接指定了storage_state路径
|
||||
storage_state_file = request.storage_state_path
|
||||
elif request.phone:
|
||||
# 根据手机号查找
|
||||
storage_state_dir = 'storage_states'
|
||||
storage_state_file = os.path.join(storage_state_dir, f"xhs_{request.phone}.json")
|
||||
|
||||
if storage_state_file and os.path.exists(storage_state_file):
|
||||
print(f"✅ 检测到storage_state文件: {storage_state_file},将使用Playwright原生恢复", file=sys.stderr)
|
||||
|
||||
# 使用Playwright原生API恢复登录状态
|
||||
await publish_service.init_browser_with_storage_state(
|
||||
storage_state_path=storage_state_file,
|
||||
proxy=proxy
|
||||
)
|
||||
else:
|
||||
print(f"⚠️ storage_state文件不存在: {storage_state_file},回退到login_state或cookies模式", file=sys.stderr)
|
||||
# 回退到旧模式
|
||||
if request.login_state:
|
||||
await _init_with_login_state(publish_service, request.login_state, proxy)
|
||||
elif request.cookies:
|
||||
await publish_service.init_browser(cookies=request.cookies, proxy=proxy)
|
||||
else:
|
||||
return BaseResponse(
|
||||
code=1,
|
||||
message="storage_state文件不存在,且未提供 login_state 或 cookies",
|
||||
data=None
|
||||
)
|
||||
|
||||
elif request.login_state:
|
||||
# 模式2:使用login_state
|
||||
print("✅ 检测到login_state,将恢复完整登录状态", file=sys.stderr)
|
||||
await _init_with_login_state(publish_service, request.login_state, proxy)
|
||||
|
||||
elif request.cookies:
|
||||
# 模式3:仅使用Cookies(兼容旧版)
|
||||
print("⚠️ 检测到仅有Cookies,建议使用storage_state或login_state获取更好的兼容性", file=sys.stderr)
|
||||
await publish_service.init_browser(cookies=request.cookies, proxy=proxy)
|
||||
else:
|
||||
return BaseResponse(
|
||||
code=1,
|
||||
message="请提供 storage_state_path、phone、login_state 或 cookies",
|
||||
data=None
|
||||
)
|
||||
|
||||
# 调用发布方法(使用已经初始化好的publish_service)
|
||||
result = await publish_service.publish_note(
|
||||
title=request.title,
|
||||
content=request.content,
|
||||
images=request.images,
|
||||
topics=request.topics
|
||||
topics=request.topics,
|
||||
cookies=None, # 已经注入,不需要再传
|
||||
proxy=None, # 已经设置,不需要再传
|
||||
)
|
||||
|
||||
# 关闭独立的浏览器实例
|
||||
await publish_service.close_browser()
|
||||
|
||||
if result["success"]:
|
||||
return BaseResponse(
|
||||
code=0,
|
||||
@@ -220,13 +752,55 @@ async def publish_note(request: PublishNoteRequest):
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
print(f"发布笔记异常: {str(e)}")
|
||||
print(f"发布笔记异常: {str(e)}", file=sys.stderr)
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return BaseResponse(
|
||||
code=1,
|
||||
message=f"发布失败: {str(e)}",
|
||||
data=None
|
||||
)
|
||||
|
||||
async def _init_with_login_state(publish_service, login_state, proxy):
|
||||
"""使用login_state初始化浏览器"""
|
||||
# 保存login_state到临时文件
|
||||
import tempfile
|
||||
import uuid
|
||||
temp_file = os.path.join(tempfile.gettempdir(), f"login_state_{uuid.uuid4()}.json")
|
||||
with open(temp_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(login_state, f, ensure_ascii=False, indent=2)
|
||||
|
||||
# 使用restore_state=True恢复完整状态
|
||||
await publish_service.init_browser(
|
||||
cookies=login_state.get('cookies'),
|
||||
proxy=proxy,
|
||||
user_agent=login_state.get('user_agent')
|
||||
)
|
||||
|
||||
# 恢夏localStorage和sessionStorage
|
||||
try:
|
||||
if login_state.get('localStorage') or login_state.get('sessionStorage'):
|
||||
target_url = login_state.get('url', 'https://creator.xiaohongshu.com')
|
||||
await publish_service.page.goto(target_url, wait_until='domcontentloaded', timeout=15000)
|
||||
|
||||
if login_state.get('localStorage'):
|
||||
for key, value in login_state['localStorage'].items():
|
||||
await publish_service.page.evaluate(f'localStorage.setItem("{key}", {json.dumps(value)})')
|
||||
|
||||
if login_state.get('sessionStorage'):
|
||||
for key, value in login_state['sessionStorage'].items():
|
||||
await publish_service.page.evaluate(f'sessionStorage.setItem("{key}", {json.dumps(value)})')
|
||||
|
||||
print("✅ 已恢夏localStorage和sessionStorage", file=sys.stderr)
|
||||
except Exception as e:
|
||||
print(f"⚠️ 恢夏storage失败: {str(e)}", file=sys.stderr)
|
||||
|
||||
# 清理临时文件
|
||||
try:
|
||||
os.remove(temp_file)
|
||||
except:
|
||||
pass
|
||||
|
||||
@app.post("/api/xhs/upload-images")
|
||||
async def upload_images(files: List[UploadFile] = File(...)):
|
||||
"""
|
||||
@@ -279,4 +853,20 @@ async def upload_images(files: List[UploadFile] = File(...)):
|
||||
|
||||
if __name__ == "__main__":
|
||||
import uvicorn
|
||||
uvicorn.run(app, host="0.0.0.0", port=8000)
|
||||
|
||||
# 从配置文件读取服务器配置
|
||||
config = get_config()
|
||||
host = config.get_str('server.host', '0.0.0.0')
|
||||
port = config.get_int('server.port', 8000)
|
||||
debug = config.get_bool('server.debug', False)
|
||||
reload = config.get_bool('server.reload', False)
|
||||
|
||||
print(f"[\u542f\u52a8\u670d\u52a1] \u4e3b\u673a: {host}, \u7aef\u53e3: {port}, \u8c03\u8bd5: {debug}, \u70ed\u91cd\u8f7d: {reload}")
|
||||
|
||||
uvicorn.run(
|
||||
app,
|
||||
host=host,
|
||||
port=port,
|
||||
reload=reload,
|
||||
log_level="debug" if debug else "info"
|
||||
)
|
||||
|
||||
157
backend/oss_utils.py
Normal file
157
backend/oss_utils.py
Normal file
@@ -0,0 +1,157 @@
|
||||
"""
|
||||
阿里云OSS工具类
|
||||
用于Python脚本中上传/下载文件到OSS
|
||||
"""
|
||||
import os
|
||||
import oss2
|
||||
from datetime import datetime
|
||||
from typing import Optional
|
||||
|
||||
|
||||
class OSSUploader:
|
||||
"""OSS上传工具"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
access_key_id: Optional[str] = None,
|
||||
access_key_secret: Optional[str] = None,
|
||||
bucket_name: Optional[str] = None,
|
||||
endpoint: Optional[str] = None
|
||||
):
|
||||
"""
|
||||
初始化OSS客户端
|
||||
|
||||
Args:
|
||||
access_key_id: AccessKey ID(可选,默认从环境变量读取)
|
||||
access_key_secret: AccessKey Secret(可选,默认从环境变量读取)
|
||||
bucket_name: Bucket名称(可选,默认从环境变量读取)
|
||||
endpoint: OSS访问域名(可选,默认从环境变量读取)
|
||||
"""
|
||||
# 使用提供的值或从环境变量读取
|
||||
self.access_key_id = access_key_id or os.getenv('OSS_TEST_ACCESS_KEY_ID', 'LTAI5tNesdhDH4ErqEUZmEg2')
|
||||
self.access_key_secret = access_key_secret or os.getenv('OSS_TEST_ACCESS_KEY_SECRET', 'xZn7WUkTW76TqOLTh01zZATnU6p3Tf')
|
||||
self.bucket_name = bucket_name or os.getenv('OSS_TEST_BUCKET', 'bxmkb-beijing')
|
||||
self.endpoint = endpoint or os.getenv('OSS_TEST_ENDPOINT', 'https://oss-cn-beijing.aliyuncs.com/')
|
||||
|
||||
# 移除endpoint中的协议前缀(oss2库不需要https://)
|
||||
self.endpoint = self.endpoint.replace('https://', '').replace('http://', '')
|
||||
|
||||
# 创建认证对象
|
||||
self.auth = oss2.Auth(self.access_key_id, self.access_key_secret)
|
||||
|
||||
# 创建Bucket对象
|
||||
self.bucket = oss2.Bucket(self.auth, self.endpoint, self.bucket_name)
|
||||
|
||||
# 基础路径
|
||||
self.base_path = "wht/"
|
||||
|
||||
def upload_file(self, local_file_path: str, object_name: Optional[str] = None) -> str:
|
||||
"""
|
||||
上传文件到OSS
|
||||
|
||||
Args:
|
||||
local_file_path: 本地文件路径
|
||||
object_name: OSS对象名称(可选,默认自动生成)
|
||||
|
||||
Returns:
|
||||
OSS文件的完整URL
|
||||
"""
|
||||
# 如果未指定对象名称,自动生成
|
||||
if object_name is None:
|
||||
# 生成格式: wht/YYYYMMDD/timestamp_filename.ext
|
||||
now = datetime.now()
|
||||
date_dir = now.strftime("%Y%m%d")
|
||||
timestamp = int(now.timestamp())
|
||||
filename = os.path.basename(local_file_path)
|
||||
object_name = f"{self.base_path}{date_dir}/{timestamp}_{filename}"
|
||||
|
||||
# 上传文件
|
||||
self.bucket.put_object_from_file(object_name, local_file_path)
|
||||
|
||||
# 生成访问URL
|
||||
url = f"https://{self.bucket_name}.{self.endpoint}/{object_name}"
|
||||
|
||||
return url
|
||||
|
||||
def upload_bytes(self, data: bytes, filename: str) -> str:
|
||||
"""
|
||||
上传字节数据到OSS
|
||||
|
||||
Args:
|
||||
data: 文件字节数据
|
||||
filename: 文件名(用于生成扩展名)
|
||||
|
||||
Returns:
|
||||
OSS文件的完整URL
|
||||
"""
|
||||
# 生成对象名称
|
||||
now = datetime.now()
|
||||
date_dir = now.strftime("%Y%m%d")
|
||||
timestamp = int(now.timestamp())
|
||||
|
||||
# 获取扩展名
|
||||
ext = os.path.splitext(filename)[1] or '.jpg'
|
||||
object_name = f"{self.base_path}{date_dir}/{timestamp}_{filename}"
|
||||
|
||||
# 上传数据
|
||||
self.bucket.put_object(object_name, data)
|
||||
|
||||
# 生成访问URL
|
||||
url = f"https://{self.bucket_name}.{self.endpoint}/{object_name}"
|
||||
|
||||
return url
|
||||
|
||||
def delete_file(self, file_url: str) -> bool:
|
||||
"""
|
||||
从OSS删除文件
|
||||
|
||||
Args:
|
||||
file_url: OSS文件的完整URL
|
||||
|
||||
Returns:
|
||||
是否删除成功
|
||||
"""
|
||||
try:
|
||||
# 从URL中提取对象名称
|
||||
# 格式: https://bucket.endpoint/path/file.jpg
|
||||
prefix = f"https://{self.bucket_name}.{self.endpoint}/"
|
||||
if file_url.startswith(prefix):
|
||||
object_name = file_url[len(prefix):]
|
||||
self.bucket.delete_object(object_name)
|
||||
return True
|
||||
else:
|
||||
return False
|
||||
except Exception as e:
|
||||
print(f"删除OSS文件失败: {e}")
|
||||
return False
|
||||
|
||||
def file_exists(self, file_url: str) -> bool:
|
||||
"""
|
||||
检查OSS文件是否存在
|
||||
|
||||
Args:
|
||||
file_url: OSS文件的完整URL
|
||||
|
||||
Returns:
|
||||
文件是否存在
|
||||
"""
|
||||
try:
|
||||
prefix = f"https://{self.bucket_name}.{self.endpoint}/"
|
||||
if file_url.startswith(prefix):
|
||||
object_name = file_url[len(prefix):]
|
||||
return self.bucket.object_exists(object_name)
|
||||
else:
|
||||
return False
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
# 创建默认实例(使用环境变量配置)
|
||||
default_uploader = None
|
||||
|
||||
def get_oss_uploader() -> OSSUploader:
|
||||
"""获取默认的OSS上传器实例"""
|
||||
global default_uploader
|
||||
if default_uploader is None:
|
||||
default_uploader = OSSUploader()
|
||||
return default_uploader
|
||||
66
backend/proxy_test_report.py
Normal file
66
backend/proxy_test_report.py
Normal file
@@ -0,0 +1,66 @@
|
||||
"""
|
||||
固定代理IP测试总结报告
|
||||
"""
|
||||
print("="*60)
|
||||
print("🎯 固定代理IP测试总结报告")
|
||||
print("="*60)
|
||||
|
||||
print("\n📋 测试概览:")
|
||||
print(" • 测试项目: 固定代理IP在小红书登录发文功能中的可用性")
|
||||
print(" • 测试时间: 2025年12月26日")
|
||||
print(" • 测试环境: Windows 10, Python虚拟环境")
|
||||
print(" • 测试代理数量: 2个")
|
||||
|
||||
print("\n✅ 代理IP详细信息:")
|
||||
print(" 代理1:")
|
||||
print(" - 服务器: http://36.137.177.131:50001")
|
||||
print(" - 用户名: qqwvy0")
|
||||
print(" - 密码: mun3r7xz")
|
||||
print(" - 状态: ✅ 可用")
|
||||
print("")
|
||||
print(" 代理2:")
|
||||
print(" - 服务器: http://111.132.40.72:50002")
|
||||
print(" - 用户名: ih3z07")
|
||||
print(" - 密码: 078bt7o5")
|
||||
print(" - 状态: ✅ 可用")
|
||||
|
||||
print("\n🧪 测试项目及结果:")
|
||||
print(" 1. requests库连接测试:")
|
||||
print(" - 代理1: ✅ 成功")
|
||||
print(" - 代理2: ✅ 成功")
|
||||
print(" - 结论: 代理IP基础连接正常")
|
||||
print("")
|
||||
print(" 2. Playwright浏览器代理测试:")
|
||||
print(" - 代理1: ✅ 成功 (可访问小红书创作者平台)")
|
||||
print(" - 代理2: ✅ 成功 (可访问小红书创作者平台)")
|
||||
print(" - 结论: 代理IP在浏览器环境中正常工作")
|
||||
print("")
|
||||
print(" 3. 网站访问能力测试:")
|
||||
print(" - 百度访问: ✅ 成功")
|
||||
print(" - IP检测网站: ✅ 成功")
|
||||
print(" - 小红书创作者平台: ✅ 成功")
|
||||
print(" - 结论: 代理IP未被目标网站封禁")
|
||||
|
||||
print("\n📊 测试结果汇总:")
|
||||
print(" • 总体成功率: 100% (2/2 个代理可用)")
|
||||
print(" • 网络延迟: 良好")
|
||||
print(" • 稳定性: 良好")
|
||||
print(" • 适用场景: 小红书登录及发文功能")
|
||||
|
||||
print("\n🔧 Playwright代理格式:")
|
||||
print(" 代理1格式: http://qqwvy0:mun3r7xz@36.137.177.131:50001")
|
||||
print(" 代理2格式: http://ih3z07:078bt7o5@111.132.40.72:50002")
|
||||
|
||||
print("\n💡 使用建议:")
|
||||
print(" 1. 在小红书自动化脚本中,可以使用以上两个代理IP")
|
||||
print(" 2. 建议轮换使用两个代理以提高稳定性")
|
||||
print(" 3. 如遇到访问限制,可尝试调整User-Agent或请求间隔")
|
||||
print(" 4. 代理IP可以有效隐藏真实IP,降低被封禁风险")
|
||||
|
||||
print("\n🎉 总结:")
|
||||
print(" 两个固定代理IP均可以正常用于小红书登录发文功能,")
|
||||
print(" 网络连接稳定,未检测到访问限制或验证码拦截。")
|
||||
|
||||
print("\n" + "="*60)
|
||||
print("报告生成完成")
|
||||
print("="*60)
|
||||
230
backend/proxy_usage_example.py
Normal file
230
backend/proxy_usage_example.py
Normal file
@@ -0,0 +1,230 @@
|
||||
"""
|
||||
固定代理IP下小红书登录和发文功能示例
|
||||
展示如何在实际应用中使用代理IP进行小红书操作
|
||||
"""
|
||||
import asyncio
|
||||
import json
|
||||
import sys
|
||||
from xhs_login import XHSLoginService
|
||||
from xhs_publish import XHSPublishService
|
||||
from damai_proxy_config import get_proxy_config
|
||||
|
||||
|
||||
async def login_with_proxy(phone: str, code: str, proxy_index: int = 0):
|
||||
"""
|
||||
使用代理进行小红书登录
|
||||
|
||||
Args:
|
||||
phone: 手机号
|
||||
code: 验证码
|
||||
proxy_index: 代理索引 (0 或 1)
|
||||
"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"📱 使用代理登录小红书")
|
||||
print(f"{'='*60}")
|
||||
|
||||
# 获取代理配置
|
||||
proxy_config = get_proxy_config(proxy_index)
|
||||
proxy_server = proxy_config['server'].replace('http://', '')
|
||||
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
|
||||
|
||||
print(f"✅ 使用代理: 代理{proxy_index + 1}")
|
||||
print(f" 代理服务器: {proxy_config['server']}")
|
||||
|
||||
# 创建登录服务
|
||||
login_service = XHSLoginService()
|
||||
|
||||
try:
|
||||
# 初始化浏览器(使用代理)
|
||||
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
||||
await login_service.init_browser(proxy=proxy_url, user_agent=user_agent)
|
||||
print("✅ 浏览器初始化成功(已启用代理)")
|
||||
|
||||
# 执行登录
|
||||
result = await login_service.login(phone, code)
|
||||
|
||||
if result.get('success'):
|
||||
print("✅ 登录成功!")
|
||||
|
||||
# 保存Cookies到文件
|
||||
cookies_full = result.get('cookies_full', [])
|
||||
if cookies_full:
|
||||
with open('cookies_proxy.json', 'w', encoding='utf-8') as f:
|
||||
json.dump(cookies_full, f, ensure_ascii=False, indent=2)
|
||||
print("✅ 已保存登录Cookies到 cookies_proxy.json")
|
||||
|
||||
return result
|
||||
else:
|
||||
print(f"❌ 登录失败: {result.get('error')}")
|
||||
return result
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 登录过程异常: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return {"success": False, "error": str(e)}
|
||||
finally:
|
||||
await login_service.close_browser()
|
||||
|
||||
|
||||
async def publish_with_proxy(title: str, content: str, images: list = None, tags: list = None, proxy_index: int = 0, cookies_file: str = 'cookies.json'):
|
||||
"""
|
||||
使用代理发布小红书笔记
|
||||
|
||||
Args:
|
||||
title: 笔记标题
|
||||
content: 笔记内容
|
||||
images: 图片路径列表
|
||||
tags: 标签列表
|
||||
proxy_index: 代理索引 (0 或 1)
|
||||
cookies_file: Cookies文件路径
|
||||
"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"📝 使用代理发布小红书笔记")
|
||||
print(f"{'='*60}")
|
||||
|
||||
# 获取代理配置
|
||||
proxy_config = get_proxy_config(proxy_index)
|
||||
proxy_server = proxy_config['server'].replace('http://', '')
|
||||
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
|
||||
|
||||
print(f"✅ 使用代理: 代理{proxy_index + 1}")
|
||||
print(f" 代理服务器: {proxy_config['server']}")
|
||||
|
||||
# 读取Cookies
|
||||
try:
|
||||
with open(cookies_file, 'r', encoding='utf-8') as f:
|
||||
cookies = json.load(f)
|
||||
print(f"✅ 成功读取Cookies: {len(cookies)} 个")
|
||||
except FileNotFoundError:
|
||||
print(f"❌ Cookies文件不存在: {cookies_file}")
|
||||
return {"success": False, "error": f"Cookies文件不存在: {cookies_file}"}
|
||||
except Exception as e:
|
||||
print(f"❌ 读取Cookies失败: {str(e)}")
|
||||
return {"success": False, "error": str(e)}
|
||||
|
||||
# 准备发布数据
|
||||
images = images or []
|
||||
tags = tags or []
|
||||
|
||||
print(f"📝 发布内容:")
|
||||
print(f" 标题: {title}")
|
||||
print(f" 内容: {content[:50]}...") # 只显示前50个字符
|
||||
print(f" 图片: {len(images)} 张")
|
||||
print(f" 标签: {tags}")
|
||||
|
||||
# 创建发布服务
|
||||
try:
|
||||
publisher = XHSPublishService(cookies, proxy=proxy_url)
|
||||
|
||||
# 执行发布
|
||||
result = await publisher.publish(
|
||||
title=title,
|
||||
content=content,
|
||||
images=images,
|
||||
tags=tags
|
||||
)
|
||||
|
||||
if result.get('success'):
|
||||
print("✅ 发布成功!")
|
||||
else:
|
||||
print(f"❌ 发布失败: {result.get('error')}")
|
||||
|
||||
return result
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 发布过程异常: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return {"success": False, "error": str(e)}
|
||||
|
||||
|
||||
async def test_proxy_functionality():
|
||||
"""测试代理功能的完整流程"""
|
||||
print("🚀 开始测试代理功能完整流程")
|
||||
|
||||
# 1. 测试代理连接
|
||||
print(f"\n{'-'*40}")
|
||||
print("1. 测试代理连接...")
|
||||
|
||||
for i in range(2):
|
||||
proxy_config = get_proxy_config(i)
|
||||
proxy_server = proxy_config['server'].replace('http://', '')
|
||||
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
|
||||
|
||||
print(f" 代理{i+1}: {proxy_config['server']} - {proxy_url}")
|
||||
|
||||
# 2. 演示如何使用代理登录(仅展示,不实际执行)
|
||||
print(f"\n{'-'*40}")
|
||||
print("2. 代理登录示例(代码演示)...")
|
||||
print("""
|
||||
# 登录示例代码:
|
||||
async def example_login():
|
||||
result = await login_with_proxy(
|
||||
phone="你的手机号", # 实际手机号
|
||||
code="验证码", # 实际验证码
|
||||
proxy_index=0 # 使用代理1
|
||||
)
|
||||
return result
|
||||
""")
|
||||
|
||||
# 3. 演示如何使用代理发布(仅展示,不实际执行)
|
||||
print(f"\n{'-'*40}")
|
||||
print("3. 代理发布示例(代码演示)...")
|
||||
print("""
|
||||
# 发布示例代码:
|
||||
async def example_publish():
|
||||
result = await publish_with_proxy(
|
||||
title="测试标题",
|
||||
content="测试内容",
|
||||
images=["图片路径1", "图片路径2"], # 可选
|
||||
tags=["标签1", "标签2"], # 可选
|
||||
proxy_index=1, # 使用代理2
|
||||
cookies_file="cookies.json" # Cookies文件路径
|
||||
)
|
||||
return result
|
||||
""")
|
||||
|
||||
# 4. 代理轮换策略
|
||||
print(f"\n{'-'*40}")
|
||||
print("4. 代理轮换策略...")
|
||||
print("""
|
||||
# 代理轮换示例:
|
||||
class ProxyManager:
|
||||
def __init__(self):
|
||||
self.current_proxy = 0
|
||||
|
||||
def get_next_proxy(self):
|
||||
proxy_config = get_proxy_config(self.current_proxy)
|
||||
self.current_proxy = (self.current_proxy + 1) % 2 # 循环使用两个代理
|
||||
return proxy_config
|
||||
""")
|
||||
|
||||
print(f"\n{'-'*40}")
|
||||
print("✅ 代理功能演示完成!")
|
||||
|
||||
|
||||
def main():
|
||||
"""主函数"""
|
||||
print("="*60)
|
||||
print("🎯 固定代理IP下小红书登录发文功能示例")
|
||||
print("="*60)
|
||||
|
||||
# 运行测试
|
||||
asyncio.run(test_proxy_functionality())
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print("💡 使用说明:")
|
||||
print(" 1. 使用 login_with_proxy() 函数进行带代理的登录")
|
||||
print(" 2. 使用 publish_with_proxy() 函数进行带代理的发布")
|
||||
print(" 3. 可以轮换使用两个代理IP以提高稳定性")
|
||||
print(" 4. 代理配置在 damai_proxy_config.py 中管理")
|
||||
print("="*60)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Windows环境下设置事件循环策略
|
||||
if sys.platform == 'win32':
|
||||
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
|
||||
|
||||
main()
|
||||
65
backend/rebuild_venv.bat
Normal file
65
backend/rebuild_venv.bat
Normal file
@@ -0,0 +1,65 @@
|
||||
@echo off
|
||||
chcp 65001 >nul
|
||||
echo ========================================
|
||||
echo 重建虚拟环境(使用标准Python)
|
||||
echo ========================================
|
||||
echo.
|
||||
|
||||
cd /d %~dp0
|
||||
|
||||
echo [步骤1] 删除旧的虚拟环境...
|
||||
if exist venv (
|
||||
rmdir /s /q venv
|
||||
echo [完成] 旧虚拟环境已删除
|
||||
) else (
|
||||
echo [提示] 没有找到旧虚拟环境
|
||||
)
|
||||
echo.
|
||||
|
||||
echo [步骤2] 使用标准Python 3.12创建新虚拟环境...
|
||||
py -3.12 -m venv venv
|
||||
if %errorlevel% neq 0 (
|
||||
echo [错误] 虚拟环境创建失败
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
echo [完成] 虚拟环境创建成功
|
||||
echo.
|
||||
|
||||
echo [步骤3] 验证新虚拟环境的Python路径...
|
||||
venv\Scripts\python.exe -c "import sys; print('Python可执行文件:', sys.executable); print('Python版本:', sys.version); print('\nsys.path前5行:'); [print(p) for i, p in enumerate(sys.path[:5])]"
|
||||
echo.
|
||||
|
||||
echo [步骤4] 升级pip...
|
||||
venv\Scripts\python.exe -m pip install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
|
||||
echo.
|
||||
|
||||
echo [步骤5] 配置pip使用清华镜像...
|
||||
venv\Scripts\pip.exe config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
|
||||
echo [完成] pip镜像源已配置
|
||||
echo.
|
||||
|
||||
echo [步骤6] 安装项目依赖...
|
||||
venv\Scripts\pip.exe install -r requirements.txt
|
||||
if %errorlevel% neq 0 (
|
||||
echo [错误] 依赖安装失败
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
echo [完成] 依赖安装成功
|
||||
echo.
|
||||
|
||||
echo [步骤7] 安装Playwright浏览器...
|
||||
venv\Scripts\playwright.exe install chromium
|
||||
if %errorlevel% neq 0 (
|
||||
echo [警告] Playwright浏览器安装可能失败,请手动检查
|
||||
)
|
||||
echo.
|
||||
|
||||
echo ========================================
|
||||
echo 虚拟环境重建完成!
|
||||
echo ========================================
|
||||
echo.
|
||||
echo 现在可以运行 start_service.bat 启动服务
|
||||
echo.
|
||||
pause
|
||||
@@ -4,3 +4,12 @@ playwright==1.40.0
|
||||
pydantic==2.5.0
|
||||
python-multipart==0.0.6
|
||||
aiohttp==3.9.1
|
||||
oss2==2.18.4
|
||||
APScheduler==3.10.4
|
||||
PyMySQL==1.1.0
|
||||
python-dotenv==1.0.0
|
||||
PyYAML==6.0.1
|
||||
alibabacloud_dysmsapi20170525==2.0.24
|
||||
alibabacloud_credentials==0.3.4
|
||||
alibabacloud_tea_openapi==0.3.9
|
||||
alibabacloud_tea_util==0.3.13
|
||||
|
||||
563
backend/scheduler.py
Normal file
563
backend/scheduler.py
Normal file
@@ -0,0 +1,563 @@
|
||||
"""
|
||||
小红书定时发布调度器
|
||||
管理自动发布任务的调度和执行
|
||||
"""
|
||||
import asyncio
|
||||
import sys
|
||||
import random
|
||||
from datetime import datetime, time as dt_time
|
||||
from typing import List, Dict, Any, Optional
|
||||
from apscheduler.schedulers.asyncio import AsyncIOScheduler
|
||||
from apscheduler.triggers.cron import CronTrigger
|
||||
import pymysql
|
||||
import json
|
||||
import aiohttp
|
||||
|
||||
from xhs_login import XHSLoginService
|
||||
|
||||
|
||||
class XHSScheduler:
|
||||
"""小红书定时发布调度器"""
|
||||
|
||||
def __init__(self,
|
||||
db_config: Dict[str, Any],
|
||||
max_concurrent: int = 2,
|
||||
publish_timeout: int = 300,
|
||||
max_articles_per_user_per_run: int = 5,
|
||||
max_failures_per_user_per_run: int = 3,
|
||||
max_daily_articles_per_user: int = 20,
|
||||
max_hourly_articles_per_user: int = 3,
|
||||
proxy_pool_enabled: bool = False,
|
||||
proxy_pool_api_url: Optional[str] = None,
|
||||
enable_random_ua: bool = True,
|
||||
min_publish_interval: int = 30,
|
||||
max_publish_interval: int = 120,
|
||||
headless: bool = True):
|
||||
"""
|
||||
初始化调度器
|
||||
|
||||
Args:
|
||||
db_config: 数据库配置
|
||||
max_concurrent: 最大并发发布数
|
||||
publish_timeout: 发布超时时间(秒)
|
||||
max_articles_per_user_per_run: 每轮每用户最大发文数
|
||||
max_failures_per_user_per_run: 每轮每用户最大失败次数
|
||||
max_daily_articles_per_user: 每用户每日最大发文数
|
||||
max_hourly_articles_per_user: 每用户每小时最大发文数
|
||||
enable_random_ua: 是否启用随机User-Agent
|
||||
min_publish_interval: 最小发布间隔(秒)
|
||||
max_publish_interval: 最大发布间隔(秒)
|
||||
headless: 是否使用无头模式,False为有头模式(方便调试)
|
||||
"""
|
||||
self.db_config = db_config
|
||||
self.max_concurrent = max_concurrent
|
||||
self.publish_timeout = publish_timeout
|
||||
self.max_articles_per_user_per_run = max_articles_per_user_per_run
|
||||
self.max_failures_per_user_per_run = max_failures_per_user_per_run
|
||||
self.max_daily_articles_per_user = max_daily_articles_per_user
|
||||
self.max_hourly_articles_per_user = max_hourly_articles_per_user
|
||||
self.proxy_pool_enabled = proxy_pool_enabled
|
||||
self.proxy_pool_api_url = proxy_pool_api_url or ""
|
||||
self.enable_random_ua = enable_random_ua
|
||||
self.min_publish_interval = min_publish_interval
|
||||
self.max_publish_interval = max_publish_interval
|
||||
self.headless = headless
|
||||
|
||||
self.scheduler = AsyncIOScheduler()
|
||||
self.login_service = XHSLoginService(use_pool=True, headless=headless)
|
||||
self.semaphore = asyncio.Semaphore(max_concurrent)
|
||||
|
||||
print(f"[调度器] 已创建,最大并发: {max_concurrent}", file=sys.stderr)
|
||||
|
||||
def start(self, cron_expr: str = "*/5 * * * * *"):
|
||||
"""
|
||||
启动定时任务
|
||||
|
||||
Args:
|
||||
cron_expr: Cron表达式,默认每5秒执行一次
|
||||
格式: 秒 分 时 日 月 周
|
||||
"""
|
||||
# 解析cron表达式
|
||||
parts = cron_expr.split()
|
||||
if len(parts) == 6:
|
||||
# 6位格式: 秒 分 时 日 月 周
|
||||
trigger = CronTrigger(
|
||||
second=parts[0],
|
||||
minute=parts[1],
|
||||
hour=parts[2],
|
||||
day=parts[3],
|
||||
month=parts[4],
|
||||
day_of_week=parts[5]
|
||||
)
|
||||
else:
|
||||
print(f"[调度器] ⚠️ Cron表达式格式错误: {cron_expr},使用默认配置", file=sys.stderr)
|
||||
trigger = CronTrigger(second="*/5")
|
||||
|
||||
self.scheduler.add_job(
|
||||
self.auto_publish_articles,
|
||||
trigger=trigger,
|
||||
id='xhs_publish',
|
||||
name='小红书自动发布',
|
||||
max_instances=1, # 最多只允许1个实例同时运行,防止重复执行
|
||||
replace_existing=True # 如果任务已存在则替换,避免重启时重复添加
|
||||
)
|
||||
|
||||
self.scheduler.start()
|
||||
print(f"[调度器] 定时发布任务已启动,Cron表达式: {cron_expr}", file=sys.stderr)
|
||||
|
||||
def stop(self):
|
||||
"""停止定时任务"""
|
||||
self.scheduler.shutdown()
|
||||
print("[调度器] 定时发布任务已停止", file=sys.stderr)
|
||||
|
||||
def get_db_connection(self):
|
||||
"""获取数据库连接"""
|
||||
return pymysql.connect(
|
||||
host=self.db_config['host'],
|
||||
port=self.db_config['port'],
|
||||
user=self.db_config['user'],
|
||||
password=self.db_config['password'],
|
||||
database=self.db_config['database'],
|
||||
charset='utf8mb4',
|
||||
cursorclass=pymysql.cursors.DictCursor
|
||||
)
|
||||
|
||||
async def _fetch_proxy_from_pool(self) -> Optional[str]:
|
||||
"""从代理池接口获取一个代理地址(http://ip:port)"""
|
||||
if not self.proxy_pool_enabled or not self.proxy_pool_api_url:
|
||||
return None
|
||||
|
||||
try:
|
||||
timeout = aiohttp.ClientTimeout(total=10)
|
||||
async with aiohttp.ClientSession(timeout=timeout) as session:
|
||||
async with session.get(self.proxy_pool_api_url) as resp:
|
||||
if resp.status != 200:
|
||||
print(f"[调度器] 代理池接口返回非200状态码: {resp.status}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
text = (await resp.text()).strip()
|
||||
if not text:
|
||||
print("[调度器] 代理池返回内容为空", file=sys.stderr)
|
||||
return None
|
||||
|
||||
line = text.splitlines()[0].strip()
|
||||
if not line:
|
||||
print("[调度器] 代理池首行内容为空", file=sys.stderr)
|
||||
return None
|
||||
|
||||
if line.startswith("http://") or line.startswith("https://"):
|
||||
return line
|
||||
return "http://" + line
|
||||
except Exception as e:
|
||||
print(f"[调度器] 请求代理池接口失败: {str(e)}", file=sys.stderr)
|
||||
return None
|
||||
|
||||
def _generate_random_user_agent(self) -> str:
|
||||
"""生成随机User-Agent,防止浏览器指纹识别"""
|
||||
chrome_versions = ['120.0.0.0', '119.0.0.0', '118.0.0.0', '117.0.0.0', '116.0.0.0']
|
||||
windows_versions = ['Windows NT 10.0; Win64; x64', 'Windows NT 11.0; Win64; x64']
|
||||
|
||||
chrome_ver = random.choice(chrome_versions)
|
||||
win_ver = random.choice(windows_versions)
|
||||
|
||||
return f'Mozilla/5.0 ({win_ver}) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/{chrome_ver} Safari/537.36'
|
||||
|
||||
async def auto_publish_articles(self):
|
||||
"""自动发布文案(定时任务主函数)"""
|
||||
print("========== 开始执行定时发布任务 ==========", file=sys.stderr)
|
||||
start_time = datetime.now()
|
||||
|
||||
try:
|
||||
conn = self.get_db_connection()
|
||||
cursor = conn.cursor()
|
||||
|
||||
# 1. 查询所有待发布的文案
|
||||
cursor.execute("""
|
||||
SELECT * FROM ai_articles
|
||||
WHERE status = 'published_review'
|
||||
ORDER BY id ASC
|
||||
""")
|
||||
articles = cursor.fetchall()
|
||||
|
||||
if not articles:
|
||||
print("没有待发布的文案", file=sys.stderr)
|
||||
cursor.close()
|
||||
conn.close()
|
||||
return
|
||||
|
||||
original_total = len(articles)
|
||||
|
||||
# 2. 限制每用户每轮发文数
|
||||
articles = self._limit_articles_per_user(articles, self.max_articles_per_user_per_run)
|
||||
print(f"找到 {original_total} 篇待发布文案,按照每个用户每轮最多 {self.max_articles_per_user_per_run} 篇,本次计划发布 {len(articles)} 篇", file=sys.stderr)
|
||||
|
||||
# 3. 应用每日/每小时上限过滤
|
||||
if self.max_daily_articles_per_user > 0 or self.max_hourly_articles_per_user > 0:
|
||||
before_count = len(articles)
|
||||
articles = await self._filter_by_daily_and_hourly_limit(
|
||||
cursor, articles,
|
||||
self.max_daily_articles_per_user,
|
||||
self.max_hourly_articles_per_user
|
||||
)
|
||||
print(f"应用每日/每小时上限过滤:过滤前 {before_count} 篇,过滤后 {len(articles)} 篇", file=sys.stderr)
|
||||
|
||||
if not articles:
|
||||
print("所有文案均因频率限制被过滤,本轮无任务", file=sys.stderr)
|
||||
cursor.close()
|
||||
conn.close()
|
||||
return
|
||||
|
||||
# 4. 并发发布
|
||||
tasks = []
|
||||
user_fail_count = {}
|
||||
paused_users = set()
|
||||
|
||||
for article in articles:
|
||||
user_id = article['publish_user_id'] or article['created_user_id']
|
||||
|
||||
if user_id in paused_users:
|
||||
print(f"用户 {user_id} 在本轮已暂停,跳过文案 ID: {article['id']}", file=sys.stderr)
|
||||
continue
|
||||
|
||||
# 直接发布,不在这里延迟
|
||||
task = asyncio.create_task(
|
||||
self._publish_article_with_semaphore(
|
||||
article, user_id, cursor, user_fail_count, paused_users
|
||||
)
|
||||
)
|
||||
tasks.append(task)
|
||||
|
||||
# 等待所有发布完成
|
||||
results = await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
# 统计结果
|
||||
success_count = sum(1 for r in results if r is True)
|
||||
fail_count = len(results) - success_count
|
||||
|
||||
cursor.close()
|
||||
conn.close()
|
||||
|
||||
duration = (datetime.now() - start_time).total_seconds()
|
||||
print("========== 定时发布任务完成 ==========", file=sys.stderr)
|
||||
print(f"总计: {len(articles)} 篇, 成功: {success_count} 篇, 失败: {fail_count} 篇, 耗时: {duration:.1f}秒", file=sys.stderr)
|
||||
|
||||
except Exception as e:
|
||||
print(f"[调度器] 定时任务异常: {str(e)}", file=sys.stderr)
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
async def _publish_article_with_semaphore(self, article: Dict, user_id: int,
|
||||
cursor, user_fail_count: Dict, paused_users: set):
|
||||
"""带信号量控制的发布文章"""
|
||||
async with self.semaphore:
|
||||
try:
|
||||
print(f"[调度器] 开始发布文案 {article['id']}: {article['title']}", file=sys.stderr)
|
||||
success = await self._publish_single_article(article, cursor)
|
||||
|
||||
if not success:
|
||||
user_fail_count[user_id] = user_fail_count.get(user_id, 0) + 1
|
||||
if user_fail_count[user_id] >= self.max_failures_per_user_per_run:
|
||||
paused_users.add(user_id)
|
||||
print(f"用户 {user_id} 在本轮失败次数达到 {user_fail_count[user_id]} 次,暂停本轮后续发布", file=sys.stderr)
|
||||
print(f"发布失败 [文案ID: {article['id']}, 标题: {article['title']}]", file=sys.stderr)
|
||||
return False
|
||||
else:
|
||||
print(f"发布成功 [文案ID: {article['id']}, 标题: {article['title']}]", file=sys.stderr)
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
print(f"发布异常 [文案ID: {article['id']}]: {str(e)}", file=sys.stderr)
|
||||
return False
|
||||
|
||||
async def _publish_single_article(self, article: Dict, cursor) -> bool:
|
||||
"""发布单篇文章"""
|
||||
try:
|
||||
# 1. 获取用户信息
|
||||
user_id = article['publish_user_id'] or article['created_user_id']
|
||||
cursor.execute("SELECT * FROM ai_users WHERE id = %s", (user_id,))
|
||||
user = cursor.fetchone()
|
||||
|
||||
if not user:
|
||||
self._update_article_status(cursor, article['id'], 'failed', '获取用户信息失败')
|
||||
return False
|
||||
|
||||
# 2. 检查用户是否绑定小红书
|
||||
if user['is_bound_xhs'] != 1:
|
||||
self._update_article_status(cursor, article['id'], 'failed', '用户未绑定小红书账号')
|
||||
return False
|
||||
|
||||
# 3. 获取author记录和Cookie
|
||||
cursor.execute("""
|
||||
SELECT * FROM ai_authors
|
||||
WHERE phone = %s AND enterprise_id = %s AND channel = 1 AND status = 'active'
|
||||
LIMIT 1
|
||||
""", (user['phone'], user['enterprise_id']))
|
||||
author = cursor.fetchone()
|
||||
|
||||
if not author or not author['xhs_cookie']:
|
||||
self._update_article_status(cursor, article['id'], 'failed', '小红书Cookie已失效')
|
||||
return False
|
||||
|
||||
# 4. 获取文章图片
|
||||
cursor.execute("""
|
||||
SELECT image_url FROM ai_article_images
|
||||
WHERE article_id = %s
|
||||
ORDER BY sort_order ASC
|
||||
""", (article['id'],))
|
||||
images = [img['image_url'] for img in cursor.fetchall() if img['image_url']]
|
||||
|
||||
# 5. 获取标签
|
||||
cursor.execute("SELECT coze_tag FROM ai_article_tags WHERE article_id = %s LIMIT 1", (article['id'],))
|
||||
tag_row = cursor.fetchone()
|
||||
topics = []
|
||||
if tag_row and tag_row['coze_tag']:
|
||||
topics = self._parse_tags(tag_row['coze_tag'])
|
||||
|
||||
# 6. 解析Cookie并格式化
|
||||
try:
|
||||
# 数据库中存储的是完整的login_state JSON
|
||||
login_state = json.loads(author['xhs_cookie'])
|
||||
|
||||
# 处理双重JSON编码的情况
|
||||
if isinstance(login_state, str):
|
||||
login_state = json.loads(login_state)
|
||||
|
||||
# 提取cookies字段(兼容旧格式:如果login_state本身就是cookies列表)
|
||||
if isinstance(login_state, dict) and 'cookies' in login_state:
|
||||
# 新格式:login_state对象包含cookies字段
|
||||
cookies = login_state['cookies']
|
||||
print(f" 从login_state提取cookies: {len(cookies) if isinstance(cookies, list) else 'unknown'} 个", file=sys.stderr)
|
||||
elif isinstance(login_state, (list, dict)):
|
||||
# 旧格式:直接是cookies
|
||||
cookies = login_state
|
||||
print(f" 使用旧格式cookies(无login_state包装)", file=sys.stderr)
|
||||
else:
|
||||
raise ValueError(f"无法识别的Cookie存储格式: {type(login_state).__name__}")
|
||||
|
||||
# 验证cookies格式
|
||||
if not isinstance(cookies, (list, dict)):
|
||||
raise ValueError(f"Cookie必须是列表或字典格式,当前类型: {type(cookies).__name__}")
|
||||
|
||||
# 格式化Cookie,确保包含domain字段
|
||||
cookies = self._format_cookies(cookies)
|
||||
except Exception as e:
|
||||
self._update_article_status(cursor, article['id'], 'failed', f'Cookie格式错误: {str(e)}')
|
||||
return False
|
||||
|
||||
# 7. 从代理池获取代理(如果启用)
|
||||
proxy = await self._fetch_proxy_from_pool()
|
||||
if proxy:
|
||||
print(f"[调度器] 使用代理: {proxy}", file=sys.stderr)
|
||||
|
||||
# 8. 生成随机User-Agent(防指纹识别)
|
||||
user_agent = self._generate_random_user_agent() if self.enable_random_ua else None
|
||||
if user_agent:
|
||||
print(f"[调度器] 使用随机UA: {user_agent[:50]}...", file=sys.stderr)
|
||||
|
||||
# 9. 调用发布服务(增加超时控制)
|
||||
try:
|
||||
print(f"[调度器] 开始调用发布服务,超时设置: {self.publish_timeout}秒", file=sys.stderr)
|
||||
result = await asyncio.wait_for(
|
||||
self.login_service.publish_note(
|
||||
title=article['title'],
|
||||
content=article['content'],
|
||||
images=images,
|
||||
topics=topics,
|
||||
cookies=cookies,
|
||||
proxy=proxy,
|
||||
user_agent=user_agent,
|
||||
),
|
||||
timeout=self.publish_timeout
|
||||
)
|
||||
except asyncio.TimeoutError:
|
||||
error_msg = f'发布超时({self.publish_timeout}秒)'
|
||||
print(f"[调度器] {error_msg}", file=sys.stderr)
|
||||
self._update_article_status(cursor, article['id'], 'failed', error_msg)
|
||||
return False
|
||||
except Exception as e:
|
||||
error_msg = f'调用发布服务异常: {str(e)}'
|
||||
print(f"[调度器] {error_msg}", file=sys.stderr)
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
self._update_article_status(cursor, article['id'], 'failed', error_msg)
|
||||
return False
|
||||
|
||||
# 10. 更新状态
|
||||
if result['success']:
|
||||
self._update_article_status(cursor, article['id'], 'published', '发布成功')
|
||||
return True
|
||||
else:
|
||||
error_msg = result.get('error', '未知错误')
|
||||
self._update_article_status(cursor, article['id'], 'failed', error_msg)
|
||||
return False
|
||||
|
||||
except Exception as e:
|
||||
self._update_article_status(cursor, article['id'], 'failed', f'发布异常: {str(e)}')
|
||||
return False
|
||||
|
||||
def _update_article_status(self, cursor, article_id: int, status: str, message: str = ''):
|
||||
"""更新文章状态"""
|
||||
try:
|
||||
if status == 'published':
|
||||
cursor.execute("""
|
||||
UPDATE ai_articles
|
||||
SET status = %s, publish_time = NOW(), updated_at = NOW()
|
||||
WHERE id = %s
|
||||
""", (status, article_id))
|
||||
else:
|
||||
cursor.execute("""
|
||||
UPDATE ai_articles
|
||||
SET status = %s, review_comment = %s, updated_at = NOW()
|
||||
WHERE id = %s
|
||||
""", (status, message, article_id))
|
||||
cursor.connection.commit()
|
||||
except Exception as e:
|
||||
print(f"更新文章 {article_id} 状态失败: {str(e)}", file=sys.stderr)
|
||||
|
||||
def _limit_articles_per_user(self, articles: List[Dict], per_user_limit: int) -> List[Dict]:
|
||||
"""限制每用户发文数"""
|
||||
if per_user_limit <= 0:
|
||||
return articles
|
||||
|
||||
grouped = {}
|
||||
for art in articles:
|
||||
user_id = art['publish_user_id'] or art['created_user_id']
|
||||
if user_id not in grouped:
|
||||
grouped[user_id] = []
|
||||
grouped[user_id].append(art)
|
||||
|
||||
limited = []
|
||||
for user_id, user_articles in grouped.items():
|
||||
limited.extend(user_articles[:per_user_limit])
|
||||
|
||||
return limited
|
||||
|
||||
async def _filter_by_daily_and_hourly_limit(self, cursor, articles: List[Dict],
|
||||
max_daily: int, max_hourly: int) -> List[Dict]:
|
||||
"""按每日和每小时上限过滤文章"""
|
||||
if max_daily <= 0 and max_hourly <= 0:
|
||||
return articles
|
||||
|
||||
# 提取所有用户ID
|
||||
user_ids = set()
|
||||
for art in articles:
|
||||
user_id = art['publish_user_id'] or art['created_user_id']
|
||||
user_ids.add(user_id)
|
||||
|
||||
# 查询每用户已发布数量
|
||||
user_daily_published = {}
|
||||
user_hourly_published = {}
|
||||
|
||||
now = datetime.now()
|
||||
today_start = now.replace(hour=0, minute=0, second=0, microsecond=0)
|
||||
current_hour_start = now.replace(minute=0, second=0, microsecond=0)
|
||||
|
||||
for user_id in user_ids:
|
||||
# 查询当日已发布数量
|
||||
if max_daily > 0:
|
||||
cursor.execute("""
|
||||
SELECT COUNT(*) as count FROM ai_articles
|
||||
WHERE status = 'published' AND publish_time >= %s
|
||||
AND (publish_user_id = %s OR (publish_user_id IS NULL AND created_user_id = %s))
|
||||
""", (today_start, user_id, user_id))
|
||||
user_daily_published[user_id] = cursor.fetchone()['count']
|
||||
|
||||
# 查询当前小时已发布数量
|
||||
if max_hourly > 0:
|
||||
cursor.execute("""
|
||||
SELECT COUNT(*) as count FROM ai_articles
|
||||
WHERE status = 'published' AND publish_time >= %s
|
||||
AND (publish_user_id = %s OR (publish_user_id IS NULL AND created_user_id = %s))
|
||||
""", (current_hour_start, user_id, user_id))
|
||||
user_hourly_published[user_id] = cursor.fetchone()['count']
|
||||
|
||||
# 过滤超限文章
|
||||
filtered = []
|
||||
for art in articles:
|
||||
user_id = art['publish_user_id'] or art['created_user_id']
|
||||
|
||||
# 检查每日上限
|
||||
if max_daily > 0 and user_daily_published.get(user_id, 0) >= max_daily:
|
||||
continue
|
||||
|
||||
# 检查每小时上限
|
||||
if max_hourly > 0 and user_hourly_published.get(user_id, 0) >= max_hourly:
|
||||
continue
|
||||
|
||||
filtered.append(art)
|
||||
|
||||
return filtered
|
||||
|
||||
def _parse_tags(self, tag_str: str) -> List[str]:
|
||||
"""解析标签字符串"""
|
||||
if not tag_str:
|
||||
return []
|
||||
|
||||
# 替换分隔符
|
||||
tag_str = tag_str.replace(';', ',').replace(' ', ',').replace('、', ',')
|
||||
|
||||
# 分割并清理
|
||||
tags = []
|
||||
for tag in tag_str.split(','):
|
||||
tag = tag.strip()
|
||||
if tag:
|
||||
tags.append(tag)
|
||||
|
||||
return tags
|
||||
|
||||
def _format_cookies(self, cookies) -> List[Dict]:
|
||||
"""
|
||||
格式化Cookie,只处理非标准格式的Cookie
|
||||
对于Playwright原生格式的Cookie,直接返回,不做任何修改
|
||||
|
||||
Args:
|
||||
cookies: Cookie数据,支持list[dict]或dict格式
|
||||
|
||||
Returns:
|
||||
格式化后的Cookie列表
|
||||
"""
|
||||
# 如果是字典格式(键值对),转换为列表格式
|
||||
if isinstance(cookies, dict):
|
||||
cookies = [
|
||||
{
|
||||
"name": name,
|
||||
"value": str(value) if not isinstance(value, str) else value,
|
||||
"domain": ".xiaohongshu.com",
|
||||
"path": "/"
|
||||
}
|
||||
for name, value in cookies.items()
|
||||
]
|
||||
|
||||
# 验证是否为列表
|
||||
if not isinstance(cookies, list):
|
||||
raise ValueError(f"Cookie必须是列表或字典格式,当前类型: {type(cookies).__name__}")
|
||||
|
||||
# 检查是否为空列表
|
||||
if not cookies or len(cookies) == 0:
|
||||
print(f" Cookie列表为空,直接返回", file=sys.stderr)
|
||||
return cookies
|
||||
|
||||
# 检查是否是Playwright原生格式(包含name和value字段)
|
||||
if isinstance(cookies[0], dict) and 'name' in cookies[0] and 'value' in cookies[0]:
|
||||
# 已经是Playwright格式,直接返回,不做任何修改
|
||||
print(f" 检测到Playwright原生格式,直接使用 ({len(cookies)} 个cookie)", file=sys.stderr)
|
||||
return cookies
|
||||
|
||||
# 其他格式,进行基础验证
|
||||
formatted_cookies = []
|
||||
for cookie in cookies:
|
||||
if not isinstance(cookie, dict):
|
||||
raise ValueError(f"Cookie元素必须是字典格式,当前类型: {type(cookie).__name__}")
|
||||
|
||||
# 确保有基本字段
|
||||
if 'domain' not in cookie and 'url' not in cookie:
|
||||
cookie = cookie.copy()
|
||||
cookie['domain'] = '.xiaohongshu.com'
|
||||
if 'path' not in cookie and 'url' not in cookie:
|
||||
if 'domain' in cookie or 'url' not in cookie:
|
||||
cookie = cookie.copy() if cookie is cookies[cookies.index(cookie)] else cookie
|
||||
cookie['path'] = '/'
|
||||
|
||||
formatted_cookies.append(cookie)
|
||||
|
||||
return formatted_cookies
|
||||
55
backend/simple_proxy_test.py
Normal file
55
backend/simple_proxy_test.py
Normal file
@@ -0,0 +1,55 @@
|
||||
import requests
|
||||
from damai_proxy_config import get_proxy_config
|
||||
|
||||
def test_single_proxy(index):
|
||||
"""测试单个代理"""
|
||||
try:
|
||||
# 获取代理配置
|
||||
proxy_info = get_proxy_config(index)
|
||||
proxy_server = proxy_info['server'].replace('http://', '')
|
||||
proxy_url = f"http://{proxy_info['username']}:{proxy_info['password']}@{proxy_server}"
|
||||
|
||||
proxies = {
|
||||
'http': proxy_url,
|
||||
'https': proxy_url
|
||||
}
|
||||
|
||||
print(f'🔍 测试代理 {index + 1}: {proxy_info["server"]}')
|
||||
|
||||
# 测试连接
|
||||
response = requests.get('http://httpbin.org/ip', proxies=proxies, timeout=10)
|
||||
|
||||
if response.status_code == 200:
|
||||
print(f'✅ 代理 {index + 1} 连接成功! 状态码: {response.status_code}')
|
||||
print(f'🌐 IP信息: {response.text}')
|
||||
return True
|
||||
else:
|
||||
print(f'❌ 代理 {index + 1} 连接失败! 状态码: {response.status_code}')
|
||||
return False
|
||||
|
||||
except requests.exceptions.ProxyError:
|
||||
print(f'❌ 代理 {index + 1} 连接错误:无法连接到代理服务器')
|
||||
return False
|
||||
except requests.exceptions.ConnectTimeout:
|
||||
print(f'❌ 代理 {index + 1} 连接超时')
|
||||
return False
|
||||
except Exception as e:
|
||||
print(f'❌ 代理 {index + 1} 连接失败: {str(e)}')
|
||||
return False
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("🚀 开始测试固定代理IP连接性\n")
|
||||
|
||||
# 测试两个代理
|
||||
for i in range(2):
|
||||
success = test_single_proxy(i)
|
||||
if success:
|
||||
print(f"✅ 代理 {i+1} 可用,适用于小红书登录发文\n")
|
||||
else:
|
||||
print(f"❌ 代理 {i+1} 不可用\n")
|
||||
|
||||
if i == 0: # 在测试第二个之前稍等一下
|
||||
import time
|
||||
time.sleep(2)
|
||||
|
||||
print("测试完成!")
|
||||
@@ -1,8 +1,9 @@
|
||||
@echo off
|
||||
echo 正在激活虚拟环境...
|
||||
venv\Scripts\activate
|
||||
call venv\Scripts\activate.bat
|
||||
|
||||
echo 正在启动小红书登录服务...
|
||||
echo 正在启动小红书登录服务(开发环境)...
|
||||
set "ENV=dev"
|
||||
python main.py
|
||||
|
||||
pause
|
||||
|
||||
@@ -1,7 +1,33 @@
|
||||
#!/bin/bash
|
||||
# 小红书Python服务启动脚本(开发环境)
|
||||
# 用途:前台启动,方便查看日志
|
||||
|
||||
echo "正在激活虚拟环境..."
|
||||
cd "$(dirname "$0")"
|
||||
|
||||
echo "========================================"
|
||||
echo " 小红书登录服务(开发模式)"
|
||||
echo "========================================"
|
||||
echo ""
|
||||
|
||||
# 激活虚拟环境
|
||||
echo "[环境] 激活虚拟环境: $(pwd)/venv"
|
||||
source venv/bin/activate
|
||||
if [ $? -ne 0 ]; then
|
||||
echo "[错误] 虚拟环境激活失败"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "正在启动小红书登录服务..."
|
||||
python main.py
|
||||
# 显示Python版本和路径
|
||||
echo "[Python] $(python --version)"
|
||||
echo "[路径] $(which python)"
|
||||
echo ""
|
||||
|
||||
echo "[启动] 正在启动Python服务(端口8000)..."
|
||||
echo "[说明] 按Ctrl+C停止服务"
|
||||
echo ""
|
||||
|
||||
# 设置环境为开发环境
|
||||
export ENV=dev
|
||||
|
||||
# 启动服务(开发模式,不使用reload)
|
||||
python -m uvicorn main:app --host 0.0.0.0 --port 8000
|
||||
|
||||
9
backend/start_prod.bat
Normal file
9
backend/start_prod.bat
Normal file
@@ -0,0 +1,9 @@
|
||||
@echo off
|
||||
echo 正在激活虚拟环境...
|
||||
call venv\Scripts\activate.bat
|
||||
|
||||
echo 正在启动小红书登录服务(生产环境)...
|
||||
set "ENV=prod"
|
||||
python main.py
|
||||
|
||||
pause
|
||||
32
backend/start_prod.sh
Normal file
32
backend/start_prod.sh
Normal file
@@ -0,0 +1,32 @@
|
||||
#!/bin/bash
|
||||
# 小红书Python服务启动脚本(生产环境)
|
||||
|
||||
cd "$(dirname "$0")"
|
||||
|
||||
echo "========================================"
|
||||
echo " 小红书登录服务(生产模式)"
|
||||
echo "========================================"
|
||||
echo ""
|
||||
|
||||
# 激活虚拟环境
|
||||
echo "[环境] 激活虚拟环境: $(pwd)/venv"
|
||||
source venv/bin/activate
|
||||
if [ $? -ne 0 ]; then
|
||||
echo "[错误] 虚拟环境激活失败"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 显示Python版本和路径
|
||||
echo "[Python] $(python --version)"
|
||||
echo "[路径] $(which python)"
|
||||
echo ""
|
||||
|
||||
echo "[启动] 正在启动Python服务(生产环境,端口8000)..."
|
||||
echo "[说明] 按Ctrl+C停止服务"
|
||||
echo ""
|
||||
|
||||
# 设置环境为生产环境
|
||||
export ENV=prod
|
||||
|
||||
# 启动服务(生产模式)
|
||||
python -m uvicorn main:app --host 0.0.0.0 --port 8000
|
||||
66
backend/start_service.bat
Normal file
66
backend/start_service.bat
Normal file
@@ -0,0 +1,66 @@
|
||||
@echo off
|
||||
setlocal enabledelayedexpansion
|
||||
chcp 65001 >nul
|
||||
echo ====================================
|
||||
echo 小红书登录服务(浏览器池模式)
|
||||
echo ====================================
|
||||
echo.
|
||||
|
||||
cd /d %~dp0
|
||||
|
||||
REM 检查虚拟环境
|
||||
if not exist "venv\Scripts\python.exe" (
|
||||
echo [错误] 未找到虚拟环境,请先运行: python -m venv venv
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
REM 检查并清理端口8000占用
|
||||
echo [检查] 正在检查端口8000占用情况...
|
||||
for /f "tokens=5" %%a in ('netstat -ano ^| findstr :8000 ^| findstr LISTENING') do (
|
||||
echo [清理] 发现端口8000被进程%%a占用,正在清理...
|
||||
taskkill /F /PID %%a >nul 2>&1
|
||||
if !errorlevel! equ 0 (
|
||||
echo [成功] 已清理进程%%a
|
||||
) else (
|
||||
echo [警告] 无法清理进程%%a,可能需要管理员权限
|
||||
)
|
||||
)
|
||||
|
||||
REM 等待端口释放
|
||||
timeout /t 1 /nobreak >nul
|
||||
|
||||
echo.
|
||||
echo [启动] 正在启动Python服务(端口8000)...
|
||||
echo [模式] 浏览器池模式 - 性能优化
|
||||
echo [说明] 浏览器实例将在30分钟无操作后自动清理
|
||||
echo.
|
||||
|
||||
REM 激活虚拟环境并启动服务
|
||||
echo [Environment] Using virtual environment: %~dp0venv
|
||||
call "%~dp0venv\Scripts\activate.bat"
|
||||
if !errorlevel! neq 0 (
|
||||
echo [错误] 虚拟环境激活失败
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
REM 显示Python版本和路径
|
||||
echo.
|
||||
echo [Python Version]
|
||||
python --version
|
||||
echo [Python Path]
|
||||
where python
|
||||
echo.
|
||||
|
||||
REM 确认使用虚拟环境的Python
|
||||
echo [Verify] Checking virtual environment...
|
||||
python -c "import sys; print('Python executable:', sys.executable)"
|
||||
echo.
|
||||
|
||||
REM 启动服务(使用虚拟环境的uvicorn)
|
||||
echo [Service] Starting FastAPI service...
|
||||
echo [Notice] Reload mode disabled for Windows compatibility
|
||||
python -m uvicorn main:app --host 0.0.0.0 --port 8000
|
||||
|
||||
pause
|
||||
50
backend/start_service.sh
Normal file
50
backend/start_service.sh
Normal file
@@ -0,0 +1,50 @@
|
||||
#!/bin/bash
|
||||
echo "===================================="
|
||||
echo " 小红书登录服务(浏览器池模式)"
|
||||
echo "===================================="
|
||||
echo ""
|
||||
|
||||
cd "$(dirname "$0")"
|
||||
|
||||
# 检查虚拟环境
|
||||
if [ ! -f "venv/bin/python" ]; then
|
||||
echo "[错误] 未找到虚拟环境,请先运行: python3 -m venv venv"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 检查并清理端口8000占用
|
||||
echo "[检查] 正在检查端口8000占用情况..."
|
||||
PID=$(lsof -ti:8000)
|
||||
if [ ! -z "$PID" ]; then
|
||||
echo "[清理] 发现端口8000被进程$PID占用,正在清理..."
|
||||
kill -9 $PID 2>/dev/null
|
||||
if [ $? -eq 0 ]; then
|
||||
echo "[成功] 已清理进程$PID"
|
||||
else
|
||||
echo "[警告] 无法清理进程$PID,可能需要sudo权限"
|
||||
fi
|
||||
sleep 1
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "[启动] 正在启动Python服务(端口8000)..."
|
||||
echo "[模式] 浏览器池模式 - 性能优化"
|
||||
echo "[说明] 浏览器实例将在30分钟无操作后自动清理"
|
||||
echo ""
|
||||
|
||||
# 激活虚拟环境
|
||||
echo "[环境] 激活虚拟环境: $(pwd)/venv"
|
||||
source venv/bin/activate
|
||||
if [ $? -ne 0 ]; then
|
||||
echo "[错误] 虚拟环境激活失败"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 显示Python版本和路径
|
||||
echo "[Python] $(python --version)"
|
||||
echo "[路径] $(which python)"
|
||||
echo ""
|
||||
|
||||
# 启动服务(使用虚拟环境的uvicorn)
|
||||
echo "[Notice] Reload mode disabled for Windows compatibility"
|
||||
python -m uvicorn main:app --host 0.0.0.0 --port 8000
|
||||
9
backend/stop.sh
Normal file
9
backend/stop.sh
Normal file
@@ -0,0 +1,9 @@
|
||||
#!/bin/bash
|
||||
# 小红书Python服务停止脚本
|
||||
# 用途:停止生产环境服务
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
cd "$SCRIPT_DIR"
|
||||
|
||||
# 调用生产环境脚本的stop命令
|
||||
"$SCRIPT_DIR/start_prod.sh" stop
|
||||
137
backend/storage_state_manager.py
Normal file
137
backend/storage_state_manager.py
Normal file
@@ -0,0 +1,137 @@
|
||||
"""
|
||||
小红书Storage State文件管理工具
|
||||
用于管理和清理storage_state文件
|
||||
"""
|
||||
import os
|
||||
import json
|
||||
import time
|
||||
from datetime import datetime, timedelta
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
STORAGE_DIR = "storage_states"
|
||||
|
||||
|
||||
def get_storage_files():
|
||||
"""获取所有storage_state文件"""
|
||||
if not os.path.exists(STORAGE_DIR):
|
||||
return []
|
||||
|
||||
files = []
|
||||
for filename in os.listdir(STORAGE_DIR):
|
||||
if filename.endswith('.json'):
|
||||
filepath = os.path.join(STORAGE_DIR, filename)
|
||||
stat = os.stat(filepath)
|
||||
files.append({
|
||||
'filename': filename,
|
||||
'filepath': filepath,
|
||||
'size': stat.st_size,
|
||||
'modified_time': stat.st_mtime,
|
||||
'modified_date': datetime.fromtimestamp(stat.st_mtime)
|
||||
})
|
||||
return files
|
||||
|
||||
|
||||
def cleanup_old_files(days=30):
|
||||
"""清理超过指定天数未使用的文件"""
|
||||
files = get_storage_files()
|
||||
cutoff_time = time.time() - (days * 24 * 60 * 60)
|
||||
deleted_count = 0
|
||||
|
||||
print(f"\n开始清理{days}天前的storage_state文件...")
|
||||
for file_info in files:
|
||||
if file_info['modified_time'] < cutoff_time:
|
||||
try:
|
||||
os.remove(file_info['filepath'])
|
||||
print(f" 已删除: {file_info['filename']} (最后修改: {file_info['modified_date']})")
|
||||
deleted_count += 1
|
||||
except Exception as e:
|
||||
print(f" 删除失败 {file_info['filename']}: {e}")
|
||||
|
||||
print(f"\n清理完成!共删除 {deleted_count} 个文件")
|
||||
return deleted_count
|
||||
|
||||
|
||||
def list_storage_files():
|
||||
"""列出所有storage_state文件"""
|
||||
files = get_storage_files()
|
||||
|
||||
if not files:
|
||||
print("\n未找到任何storage_state文件")
|
||||
return
|
||||
|
||||
print(f"\n找到 {len(files)} 个storage_state文件:\n")
|
||||
print(f"{'文件名':<40} {'大小':<10} {'最后修改时间'}")
|
||||
print("-" * 80)
|
||||
|
||||
for file_info in sorted(files, key=lambda x: x['modified_time'], reverse=True):
|
||||
size_kb = file_info['size'] / 1024
|
||||
print(f"{file_info['filename']:<40} {size_kb:>8.1f}KB {file_info['modified_date']}")
|
||||
|
||||
total_size = sum(f['size'] for f in files) / 1024 / 1024
|
||||
print(f"\n总大小: {total_size:.2f} MB")
|
||||
|
||||
|
||||
def validate_storage_file(phone):
|
||||
"""验证指定手机号的storage_state文件是否有效"""
|
||||
filepath = os.path.join(STORAGE_DIR, f"xhs_{phone}.json")
|
||||
|
||||
if not os.path.exists(filepath):
|
||||
print(f"\n❌ 文件不存在: {filepath}")
|
||||
return False
|
||||
|
||||
try:
|
||||
with open(filepath, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
|
||||
# 检查必要字段
|
||||
if 'cookies' not in data:
|
||||
print(f"\n❌ 文件格式错误: 缺少cookies字段")
|
||||
return False
|
||||
|
||||
if 'origins' not in data:
|
||||
print(f"\n⚠️ 文件格式不完整: 缺少origins字段")
|
||||
|
||||
cookie_count = len(data.get('cookies', []))
|
||||
print(f"\n✅ 文件有效")
|
||||
print(f" Cookie数量: {cookie_count}")
|
||||
print(f" 文件大小: {os.path.getsize(filepath) / 1024:.1f}KB")
|
||||
print(f" 最后修改: {datetime.fromtimestamp(os.path.getmtime(filepath))}")
|
||||
|
||||
return True
|
||||
|
||||
except json.JSONDecodeError:
|
||||
print(f"\n❌ 文件格式错误: 不是有效的JSON")
|
||||
return False
|
||||
except Exception as e:
|
||||
print(f"\n❌ 验证失败: {e}")
|
||||
return False
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import sys
|
||||
|
||||
if len(sys.argv) < 2:
|
||||
print("用法:")
|
||||
print(" python storage_state_manager.py list # 列出所有文件")
|
||||
print(" python storage_state_manager.py cleanup [days] # 清理旧文件(默认30天)")
|
||||
print(" python storage_state_manager.py validate <phone> # 验证指定手机号的文件")
|
||||
sys.exit(1)
|
||||
|
||||
command = sys.argv[1]
|
||||
|
||||
if command == "list":
|
||||
list_storage_files()
|
||||
elif command == "cleanup":
|
||||
days = int(sys.argv[2]) if len(sys.argv) > 2 else 30
|
||||
cleanup_old_files(days)
|
||||
elif command == "validate":
|
||||
if len(sys.argv) < 3:
|
||||
print("错误: 请提供手机号")
|
||||
sys.exit(1)
|
||||
phone = sys.argv[2]
|
||||
validate_storage_file(phone)
|
||||
else:
|
||||
print(f"未知命令: {command}")
|
||||
sys.exit(1)
|
||||
|
||||
33
backend/test.py
Normal file
33
backend/test.py
Normal file
@@ -0,0 +1,33 @@
|
||||
#!/usr/bin/env python
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
使用requests请求代理服务器
|
||||
请求http和https网页均适用
|
||||
"""
|
||||
|
||||
import requests
|
||||
|
||||
|
||||
proxy_ip = "36.137.177.131:50001";
|
||||
|
||||
# 用户名密码认证(私密代理/独享代理)
|
||||
username = "qqwvy0"
|
||||
password = "mun3r7xz"
|
||||
proxies = {
|
||||
"http": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": proxy_ip},
|
||||
"https": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": proxy_ip}
|
||||
}
|
||||
|
||||
|
||||
print(proxies)
|
||||
|
||||
# 要访问的目标网页
|
||||
target_url = "https://creator.xiaohongshu.com/login";
|
||||
|
||||
# 使用代理IP发送请求
|
||||
response = requests.get(target_url, proxies=proxies)
|
||||
|
||||
# 获取页面内容
|
||||
if response.status_code == 200:
|
||||
print(response.text)
|
||||
170
backend/test_basic_browser.py
Normal file
170
backend/test_basic_browser.py
Normal file
@@ -0,0 +1,170 @@
|
||||
"""
|
||||
基础浏览器测试脚本
|
||||
用于测试浏览器是否能正常加载小红书页面
|
||||
"""
|
||||
import asyncio
|
||||
from playwright.async_api import async_playwright
|
||||
import sys
|
||||
|
||||
|
||||
async def test_basic_browser(proxy_index: int = 0):
|
||||
"""基础浏览器测试"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"🔍 基础浏览器测试")
|
||||
print(f"{'='*60}")
|
||||
|
||||
# 从代理配置获取代理信息
|
||||
from damai_proxy_config import get_proxy_config
|
||||
proxy_config = get_proxy_config(proxy_index)
|
||||
proxy_server = proxy_config['server'].replace('http://', '')
|
||||
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
|
||||
|
||||
print(f"✅ 使用代理: 代理{proxy_index + 1}")
|
||||
print(f" 代理服务器: {proxy_config['server']}")
|
||||
|
||||
try:
|
||||
async with async_playwright() as p:
|
||||
# 配置代理
|
||||
proxy_parts = proxy_url.replace('http://', '').replace('https://', '').split('@')
|
||||
if len(proxy_parts) == 2:
|
||||
auth_part = proxy_parts[0]
|
||||
server_part = proxy_parts[1]
|
||||
username, password = auth_part.split(':')
|
||||
|
||||
proxy_config_obj = {
|
||||
"server": f"http://{server_part}",
|
||||
"username": username,
|
||||
"password": password
|
||||
}
|
||||
else:
|
||||
proxy_config_obj = {"server": proxy_url}
|
||||
|
||||
print(f" 配置的代理对象: {proxy_config_obj}")
|
||||
|
||||
# 启动浏览器
|
||||
browser = await p.chromium.launch(
|
||||
headless=False, # 非无头模式,便于观察
|
||||
proxy=proxy_config_obj
|
||||
)
|
||||
|
||||
# 创建上下文
|
||||
context = await browser.new_context(
|
||||
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
||||
)
|
||||
|
||||
# 创建页面
|
||||
page = await context.new_page()
|
||||
|
||||
print(f"\n🌐 尝试访问百度...")
|
||||
try:
|
||||
await page.goto('https://www.baidu.com', wait_until='networkidle', timeout=15000)
|
||||
await asyncio.sleep(2)
|
||||
|
||||
title = await page.title()
|
||||
url = page.url
|
||||
content_len = len(await page.content())
|
||||
|
||||
print(f" ✅ 百度访问成功")
|
||||
print(f" 标题: {title}")
|
||||
print(f" URL: {url}")
|
||||
print(f" 内容长度: {content_len} 字符")
|
||||
except Exception as e:
|
||||
print(f" ❌ 百度访问失败: {str(e)}")
|
||||
|
||||
print(f"\n🌐 尝试访问小红书登录页...")
|
||||
try:
|
||||
await page.goto('https://creator.xiaohongshu.com/login', wait_until='networkidle', timeout=15000)
|
||||
await asyncio.sleep(5) # 等待更长时间
|
||||
|
||||
title = await page.title()
|
||||
url = page.url
|
||||
content = await page.content()
|
||||
content_len = len(content)
|
||||
|
||||
print(f" 访问结果:")
|
||||
print(f" 标题: {title}")
|
||||
print(f" URL: {url}")
|
||||
print(f" 内容长度: {content_len} 字符")
|
||||
|
||||
# 检查是否有特定内容
|
||||
if content_len == 0:
|
||||
print(f" ⚠️ 页面内容为空,可能存在加载问题")
|
||||
elif "验证" in content or "captcha" in content.lower() or "安全" in content:
|
||||
print(f" ⚠️ 检测到验证或安全提示")
|
||||
else:
|
||||
print(f" ✅ 页面加载正常")
|
||||
|
||||
# 查找页面上的所有元素
|
||||
print(f"\n🔍 分析页面元素...")
|
||||
|
||||
# 查找所有input元素
|
||||
inputs = await page.query_selector_all('input')
|
||||
print(f" 找到 {len(inputs)} 个input元素")
|
||||
|
||||
# 查找所有表单相关元素
|
||||
form_elements = await page.query_selector_all('input, button, select, textarea')
|
||||
print(f" 找到 {len(form_elements)} 个表单相关元素")
|
||||
|
||||
# 打印前几个元素的信息
|
||||
for i, elem in enumerate(form_elements[:5]):
|
||||
try:
|
||||
tag = await elem.evaluate('el => el.tagName')
|
||||
text = await elem.inner_text()
|
||||
placeholder = await elem.get_attribute('placeholder')
|
||||
class_name = await elem.get_attribute('class')
|
||||
id_attr = await elem.get_attribute('id')
|
||||
|
||||
print(f" 元素 {i+1}:")
|
||||
print(f" - 标签: {tag}")
|
||||
print(f" - 文本: {text[:50]}...")
|
||||
print(f" - placeholder: {placeholder}")
|
||||
print(f" - class: {class_name[:50]}...")
|
||||
print(f" - id: {id_attr}")
|
||||
except Exception as e:
|
||||
print(f" 元素 {i+1}: 获取信息失败 - {str(e)}")
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ 小红书访问失败: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
print(f"\n⏸️ 浏览器保持打开状态,您可以手动检查页面")
|
||||
print(f" 按 Enter 键关闭浏览器...")
|
||||
|
||||
# 等待用户输入
|
||||
input()
|
||||
|
||||
await browser.close()
|
||||
print(f"✅ 浏览器已关闭")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 测试过程异常: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
|
||||
async def main():
|
||||
"""主函数"""
|
||||
print("="*60)
|
||||
print("🔍 基础浏览器测试工具")
|
||||
print("="*60)
|
||||
|
||||
proxy_choice = input("\n请选择代理 (0 或 1, 默认为0): ").strip()
|
||||
if proxy_choice not in ['0', '1']:
|
||||
proxy_choice = '0'
|
||||
proxy_idx = int(proxy_choice)
|
||||
|
||||
await test_basic_browser(proxy_idx)
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print("✅ 测试完成!")
|
||||
print("="*60)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Windows环境下设置事件循环策略
|
||||
if sys.platform == 'win32':
|
||||
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
|
||||
|
||||
# 运行测试
|
||||
asyncio.run(main())
|
||||
213
backend/test_browser_pool_fix.py
Normal file
213
backend/test_browser_pool_fix.py
Normal file
@@ -0,0 +1,213 @@
|
||||
"""
|
||||
测试修复后的浏览器池
|
||||
验证预热超时问题是否已解决
|
||||
"""
|
||||
import asyncio
|
||||
import sys
|
||||
from xhs_login import XHSLoginService
|
||||
|
||||
|
||||
async def test_browser_pool_with_proxy(proxy_index: int = 0):
|
||||
"""测试修复后的浏览器池"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"🔧 测试修复后的浏览器池")
|
||||
print(f"{'='*60}")
|
||||
|
||||
# 从代理配置获取代理信息
|
||||
from damai_proxy_config import get_proxy_config
|
||||
proxy_config = get_proxy_config(proxy_index)
|
||||
proxy_server = proxy_config['server'].replace('http://', '')
|
||||
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
|
||||
|
||||
print(f"✅ 使用代理: 代理{proxy_index + 1}")
|
||||
print(f" 代理服务器: {proxy_config['server']}")
|
||||
print(f" 代理URL: {proxy_url}")
|
||||
|
||||
# 创建登录服务(使用浏览器池)
|
||||
login_service = XHSLoginService(use_pool=True) # 使用浏览器池
|
||||
|
||||
try:
|
||||
print(f"\n🚀 初始化浏览器(使用代理 + 浏览器池)...")
|
||||
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
||||
await login_service.init_browser(proxy=proxy_url, user_agent=user_agent)
|
||||
print("✅ 浏览器初始化成功")
|
||||
|
||||
# 检查浏览器池状态
|
||||
browser_pool = login_service.browser_pool
|
||||
if browser_pool:
|
||||
stats = browser_pool.get_stats()
|
||||
print(f"\n📊 浏览器池状态:")
|
||||
print(f" 主浏览器存活: {stats['browser_alive']}")
|
||||
print(f" 上下文存活: {stats['context_alive']}")
|
||||
print(f" 页面存活: {stats['page_alive']}")
|
||||
print(f" 是否预热: {stats['is_preheated']}")
|
||||
print(f" 临时浏览器数: {stats['temp_browsers_count']}")
|
||||
|
||||
# 访问小红书登录页面
|
||||
print(f"\n🌐 访问小红书创作者平台...")
|
||||
await login_service.page.goto('https://creator.xiaohongshu.com/login', wait_until='domcontentloaded', timeout=30000)
|
||||
await asyncio.sleep(2)
|
||||
|
||||
title = await login_service.page.title()
|
||||
url = login_service.page.url
|
||||
content_len = len(await login_service.page.content())
|
||||
|
||||
print(f"✅ 访问成功")
|
||||
print(f" 标题: {title}")
|
||||
print(f" URL: {url}")
|
||||
print(f" 内容长度: {content_len} 字符")
|
||||
|
||||
# 检查关键元素
|
||||
phone_input = await login_service.page.query_selector('input[placeholder="手机号"]')
|
||||
if phone_input:
|
||||
print(f"✅ 找到手机号输入框")
|
||||
else:
|
||||
print(f"❌ 未找到手机号输入框")
|
||||
|
||||
# 查找所有input元素
|
||||
inputs = await login_service.page.query_selector_all('input')
|
||||
print(f" 共找到 {len(inputs)} 个input元素")
|
||||
|
||||
if content_len == 0:
|
||||
print(f"⚠️ 页面内容为空")
|
||||
else:
|
||||
print(f"✅ 页面内容正常加载")
|
||||
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 测试失败: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return False
|
||||
finally:
|
||||
await login_service.close_browser()
|
||||
|
||||
|
||||
async def test_multiple_requests(proxy_index: int = 0):
|
||||
"""测试多个请求复用浏览器池"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"🔄 测试浏览器池复用")
|
||||
print(f"{'='*60}")
|
||||
|
||||
from damai_proxy_config import get_proxy_config
|
||||
proxy_config = get_proxy_config(proxy_index)
|
||||
proxy_server = proxy_config['server'].replace('http://', '')
|
||||
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
|
||||
|
||||
print(f"✅ 使用代理: 代理{proxy_index + 1}")
|
||||
|
||||
success_count = 0
|
||||
|
||||
for i in range(3):
|
||||
print(f"\n🧪 请求 {i+1}/3")
|
||||
login_service = XHSLoginService(use_pool=True)
|
||||
|
||||
try:
|
||||
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
||||
await login_service.init_browser(proxy=proxy_url, user_agent=user_agent)
|
||||
|
||||
# 访问页面
|
||||
await login_service.page.goto('https://creator.xiaohongshu.com/login', wait_until='domcontentloaded', timeout=30000)
|
||||
await asyncio.sleep(1)
|
||||
|
||||
content_len = len(await login_service.page.content())
|
||||
if content_len > 0:
|
||||
print(f" ✅ 请求 {i+1} 成功,内容长度: {content_len}")
|
||||
success_count += 1
|
||||
else:
|
||||
print(f" ❌ 请求 {i+1} 失败,内容为空")
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ 请求 {i+1} 异常: {str(e)}")
|
||||
finally:
|
||||
await login_service.close_browser()
|
||||
|
||||
# 等待一下避免请求过于频繁
|
||||
if i < 2:
|
||||
await asyncio.sleep(1)
|
||||
|
||||
print(f"\n📈 测试结果: {success_count}/3 请求成功")
|
||||
return success_count == 3
|
||||
|
||||
|
||||
def explain_fix():
|
||||
"""解释修复内容"""
|
||||
print("="*60)
|
||||
print("🔧 修复内容说明")
|
||||
print("="*60)
|
||||
|
||||
print("\n修复的两个问题:")
|
||||
print("1. 增加超时时间: 从30秒增加到45秒")
|
||||
print("2. 修改等待策略: 从'networkidle'改为'domcontentloaded'")
|
||||
print(" - 'networkidle': 等待网络空闲(可能等待时间过长)")
|
||||
print(" - 'domcontentloaded': DOM内容加载完成(更快更稳定)")
|
||||
|
||||
print("\n浏览器池优化效果:")
|
||||
print("✅ 减少预热超时错误")
|
||||
print("✅ 提高页面加载成功率")
|
||||
print("✅ 保持浏览器常驻,提升性能")
|
||||
|
||||
|
||||
async def main():
|
||||
"""主函数"""
|
||||
explain_fix()
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print("🎯 选择测试模式")
|
||||
print("="*60)
|
||||
|
||||
print("\n1. 单次浏览器池测试")
|
||||
print("2. 多请求复用测试")
|
||||
print("3. 全部测试")
|
||||
|
||||
try:
|
||||
choice = input("\n请选择测试模式 (1-3, 默认为3): ").strip()
|
||||
|
||||
if choice not in ['1', '2', '3']:
|
||||
choice = '3'
|
||||
|
||||
proxy_choice = input("请选择代理 (0 或 1, 默认为0): ").strip()
|
||||
if proxy_choice not in ['0', '1']:
|
||||
proxy_choice = '0'
|
||||
proxy_idx = int(proxy_choice)
|
||||
|
||||
if choice in ['1', '3']:
|
||||
print(f"\n{'-'*40}")
|
||||
print("测试1: 单次浏览器池测试")
|
||||
success1 = await test_browser_pool_with_proxy(proxy_idx)
|
||||
|
||||
if choice in ['2', '3']:
|
||||
print(f"\n{'-'*40}")
|
||||
print("测试2: 多请求复用测试")
|
||||
success2 = await test_multiple_requests(proxy_idx)
|
||||
|
||||
if choice == '3':
|
||||
overall_success = success1 and success2
|
||||
elif choice == '1':
|
||||
overall_success = success1
|
||||
else:
|
||||
overall_success = success2
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
if overall_success:
|
||||
print("✅ 所有测试通过!浏览器池预热问题已修复")
|
||||
else:
|
||||
print("❌ 部分测试失败,请检查配置")
|
||||
print("="*60)
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("\n\n⚠️ 测试被用户中断")
|
||||
except Exception as e:
|
||||
print(f"\n❌ 测试过程中出现错误: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Windows环境下设置事件循环策略
|
||||
if sys.platform == 'win32':
|
||||
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
|
||||
|
||||
# 运行测试
|
||||
asyncio.run(main())
|
||||
313
backend/test_cookie_format_fix.py
Normal file
313
backend/test_cookie_format_fix.py
Normal file
@@ -0,0 +1,313 @@
|
||||
"""
|
||||
测试Cookie格式处理修复
|
||||
验证scheduler.py中的_format_cookies方法能正确处理各种Cookie格式
|
||||
"""
|
||||
import json
|
||||
from typing import List, Dict
|
||||
|
||||
|
||||
def _format_cookies(cookies) -> List[Dict]:
|
||||
"""
|
||||
格式化Cookie,只处理非标准格式的Cookie
|
||||
对于Playwright原生格式的Cookie,直接返回,不做任何修改
|
||||
|
||||
这是scheduler.py中_format_cookies方法的副本,用于独立测试
|
||||
|
||||
Args:
|
||||
cookies: Cookie数据,支持list[dict]或dict格式
|
||||
|
||||
Returns:
|
||||
格式化后的Cookie列表
|
||||
"""
|
||||
# 如果是字典格式(键值对),转换为列表格式
|
||||
if isinstance(cookies, dict):
|
||||
cookies = [
|
||||
{
|
||||
"name": name,
|
||||
"value": str(value) if not isinstance(value, str) else value,
|
||||
"domain": ".xiaohongshu.com",
|
||||
"path": "/"
|
||||
}
|
||||
for name, value in cookies.items()
|
||||
]
|
||||
|
||||
# 验证是否为列表
|
||||
if not isinstance(cookies, list):
|
||||
raise ValueError(f"Cookie必须是列表或字典格式,当前类型: {type(cookies).__name__}")
|
||||
|
||||
# 检查是否是Playwright原生格式(包含name和value字段)
|
||||
if cookies and isinstance(cookies[0], dict) and 'name' in cookies[0] and 'value' in cookies[0]:
|
||||
# 已经是Playwright格式,直接返回,不做任何修改
|
||||
return cookies
|
||||
|
||||
# 其他格式,进行基础验证
|
||||
formatted_cookies = []
|
||||
for cookie in cookies:
|
||||
if not isinstance(cookie, dict):
|
||||
raise ValueError(f"Cookie元素必须是字典格式,当前类型: {type(cookie).__name__}")
|
||||
|
||||
# 确保有基本字段
|
||||
if 'domain' not in cookie and 'url' not in cookie:
|
||||
cookie = cookie.copy()
|
||||
cookie['domain'] = '.xiaohongshu.com'
|
||||
if 'path' not in cookie and 'url' not in cookie:
|
||||
if 'domain' in cookie or 'url' not in cookie:
|
||||
cookie = cookie.copy() if cookie is cookies[cookies.index(cookie)] else cookie
|
||||
cookie['path'] = '/'
|
||||
|
||||
formatted_cookies.append(cookie)
|
||||
|
||||
return formatted_cookies
|
||||
|
||||
|
||||
def test_format_cookies():
|
||||
"""测试_format_cookies方法"""
|
||||
|
||||
print("="*60)
|
||||
print("测试 Cookie 格式处理")
|
||||
print("="*60)
|
||||
|
||||
# 测试1: 字典格式(键值对)
|
||||
print("\n测试 1: 字典格式(键值对)")
|
||||
cookies_dict = {
|
||||
"a1": "xxx",
|
||||
"webId": "yyy",
|
||||
"web_session": "zzz"
|
||||
}
|
||||
try:
|
||||
result = _format_cookies(cookies_dict)
|
||||
print(f"✅ 成功处理字典格式")
|
||||
print(f" 输入: {type(cookies_dict).__name__} with {len(cookies_dict)} items")
|
||||
print(f" 输出: {type(result).__name__} with {len(result)} items")
|
||||
print(f" 第一个Cookie: {result[0]}")
|
||||
assert isinstance(result, list)
|
||||
assert len(result) == 3
|
||||
assert all('name' in c and 'value' in c and 'domain' in c for c in result)
|
||||
except Exception as e:
|
||||
print(f"❌ 失败: {str(e)}")
|
||||
|
||||
# 测试2: 列表格式(完整格式,已有domain和path)
|
||||
print("\n测试 2: 列表格式(完整格式)")
|
||||
cookies_list_full = [
|
||||
{
|
||||
"name": "a1",
|
||||
"value": "xxx",
|
||||
"domain": ".xiaohongshu.com",
|
||||
"path": "/",
|
||||
"expires": -1,
|
||||
"httpOnly": False,
|
||||
"secure": False,
|
||||
"sameSite": "Lax"
|
||||
}
|
||||
]
|
||||
try:
|
||||
result = _format_cookies(cookies_list_full)
|
||||
print(f"✅ 成功处理完整列表格式")
|
||||
print(f" 输入: {type(cookies_list_full).__name__} with {len(cookies_list_full)} items")
|
||||
print(f" 输出: {type(result).__name__} with {len(result)} items")
|
||||
# 验证Playwright原生格式被完整保留
|
||||
print(f" 保留的字段: {list(result[0].keys())}")
|
||||
assert result == cookies_list_full, "Playwright原生格式应该被完整保留,不做任何修改"
|
||||
assert 'expires' in result[0], "expires字段应该被保留"
|
||||
assert result[0]['expires'] == -1, "expires=-1应该被保留"
|
||||
assert isinstance(result, list)
|
||||
assert len(result) == 1
|
||||
except Exception as e:
|
||||
print(f"❌ 失败: {str(e)}")
|
||||
|
||||
# 测试3: 非Playwright格式(缺少name字段,需要补充domain和path)
|
||||
print("\n测试 3: 非Playwright格式(缺少字段,需要补充)")
|
||||
cookies_list_partial = [
|
||||
{
|
||||
"cookie_name": "a1", # 没有name字段,不是Playwright格式
|
||||
"cookie_value": "xxx"
|
||||
}
|
||||
]
|
||||
try:
|
||||
result = _format_cookies(cookies_list_partial)
|
||||
print(f"✅ 成功处理非Playwright格式")
|
||||
print(f" 输入: {type(cookies_list_partial).__name__} with {len(cookies_list_partial)} items")
|
||||
print(f" 输出: {type(result).__name__} with {len(result)} items")
|
||||
print(f" 自动添加的字段: domain={result[0].get('domain')}, path={result[0].get('path')}")
|
||||
assert isinstance(result, list)
|
||||
# 应该自动添加domain和path
|
||||
assert result[0]['domain'] == '.xiaohongshu.com'
|
||||
assert result[0]['path'] == '/'
|
||||
except Exception as e:
|
||||
print(f"❌ 失败: {str(e)}")
|
||||
|
||||
# 测试4: 双重JSON编码(模拟数据库存储场景)
|
||||
print("\n测试 4: 双重JSON编码字符串")
|
||||
cookies_dict = {"a1": "xxx", "webId": "yyy"}
|
||||
# 第一次JSON编码
|
||||
cookies_json_1 = json.dumps(cookies_dict)
|
||||
# 第二次JSON编码
|
||||
cookies_json_2 = json.dumps(cookies_json_1)
|
||||
|
||||
print(f" 原始字典: {cookies_dict}")
|
||||
print(f" 第一次编码: {cookies_json_1}")
|
||||
print(f" 第二次编码: {cookies_json_2}")
|
||||
|
||||
# 模拟从数据库读取并解析
|
||||
try:
|
||||
# 第一次解析
|
||||
cookies_parsed_1 = json.loads(cookies_json_2)
|
||||
print(f" 第一次解析后类型: {type(cookies_parsed_1).__name__}")
|
||||
|
||||
# 处理双重编码
|
||||
if isinstance(cookies_parsed_1, str):
|
||||
cookies_parsed_2 = json.loads(cookies_parsed_1)
|
||||
print(f" 第二次解析后类型: {type(cookies_parsed_2).__name__}")
|
||||
cookies = cookies_parsed_2
|
||||
else:
|
||||
cookies = cookies_parsed_1
|
||||
|
||||
# 格式化
|
||||
result = _format_cookies(cookies)
|
||||
print(f"✅ 成功处理双重JSON编码")
|
||||
print(f" 最终输出: {type(result).__name__} with {len(result)} items")
|
||||
assert isinstance(result, list)
|
||||
except Exception as e:
|
||||
print(f"❌ 失败: {str(e)}")
|
||||
|
||||
# 测试5: 错误格式 - 字符串(不是JSON)
|
||||
print("\n测试 5: 错误格式 - 普通字符串")
|
||||
try:
|
||||
result = _format_cookies("invalid_string")
|
||||
print(f"❌ 应该抛出异常但没有")
|
||||
except ValueError as e:
|
||||
print(f"✅ 正确抛出ValueError异常")
|
||||
print(f" 错误信息: {str(e)}")
|
||||
except Exception as e:
|
||||
print(f"❌ 抛出了非预期的异常: {str(e)}")
|
||||
|
||||
# 测试6: 错误格式 - 列表中包含非字典元素
|
||||
print("\n测试 6: 错误格式 - 列表中包含非字典元素")
|
||||
try:
|
||||
result = _format_cookies(["string_item", 123])
|
||||
print(f"❌ 应该抛出异常但没有")
|
||||
except ValueError as e:
|
||||
print(f"✅ 正确抛出ValueError异常")
|
||||
print(f" 错误信息: {str(e)}")
|
||||
except Exception as e:
|
||||
print(f"❌ 抛出了非预期的异常: {str(e)}")
|
||||
|
||||
# 测试7: Playwright原生格式中value为对象(保持原样)
|
||||
print("\n测试 7: Playwright原生格式中value为对象(应保持原样)")
|
||||
cookies_with_object_value = [
|
||||
{
|
||||
"name": "test_cookie",
|
||||
"value": {"nested": "object"}, # value是对象
|
||||
"domain": ".xiaohongshu.com",
|
||||
"path": "/"
|
||||
}
|
||||
]
|
||||
try:
|
||||
result = _format_cookies(cookies_with_object_value)
|
||||
print(f"✅ Playwright原生格式被完整保留")
|
||||
print(f" 输入value类型: {type(cookies_with_object_value[0]['value']).__name__}")
|
||||
print(f" 输出value类型: {type(result[0]['value']).__name__}")
|
||||
print(f" 输出value内容: {result[0]['value']}")
|
||||
# Playwright原生格式不做任何修改,包括uvalue
|
||||
assert result == cookies_with_object_value, "Playwright原生格式应完整保留"
|
||||
except Exception as e:
|
||||
print(f"❌ 失败: {str(e)}")
|
||||
|
||||
# 测试8: 字典格式中value为数字
|
||||
print("\n测试 8: 字典格式中value为数字(应自动转换为字符串)")
|
||||
cookies_dict_with_number = {
|
||||
"a1": "xxx",
|
||||
"user_id": 12345, # value是数字
|
||||
"is_login": True # value是布尔值
|
||||
}
|
||||
try:
|
||||
result = _format_cookies(cookies_dict_with_number)
|
||||
print(f"✅ 成功处理数字/布尔value")
|
||||
print(f" 输入: {cookies_dict_with_number}")
|
||||
print(f" user_id value类型: {type(result[1]['value']).__name__}, 值: {result[1]['value']}")
|
||||
print(f" is_login value类型: {type(result[2]['value']).__name__}, 值: {result[2]['value']}")
|
||||
# 验证不再包含expires等字段
|
||||
print(f" 字段: {list(result[0].keys())}")
|
||||
assert all(isinstance(c['value'], str) for c in result), "所有value应该都是字符串类型"
|
||||
assert 'expires' not in result[0], "不应该包含expires字段"
|
||||
except Exception as e:
|
||||
print(f"❌ 失败: {str(e)}")
|
||||
|
||||
# 测试9: Playwright原生格式中expires=-1(应被保留)
|
||||
print("\n测试 9: Playwright原生格式中expires=-1(应被保留)")
|
||||
cookies_with_invalid_expires = [
|
||||
{
|
||||
"name": "test_cookie",
|
||||
"value": "test_value",
|
||||
"domain": ".xiaohongshu.com",
|
||||
"path": "/",
|
||||
"expires": -1 # Playwright原生格式
|
||||
}
|
||||
]
|
||||
try:
|
||||
result = _format_cookies(cookies_with_invalid_expires)
|
||||
print(f"✅ Playwright原生格式被完整保留")
|
||||
print(f" 原始字段: {list(cookies_with_invalid_expires[0].keys())}")
|
||||
print(f" 处理后字段: {list(result[0].keys())}")
|
||||
assert result == cookies_with_invalid_expires, "Playwright原生格式应被完整保留"
|
||||
assert 'expires' in result[0] and result[0]['expires'] == -1, "expires=-1应该被保留"
|
||||
except Exception as e:
|
||||
print(f"❌ 失败: {str(e)}")
|
||||
|
||||
# 测试10: Playwright原生格式中expires为浮点数(应被保留)
|
||||
print("\n测试 10: Playwright原生格式中expires为浮点数(应被保留)")
|
||||
cookies_with_float_expires = [
|
||||
{
|
||||
"name": "test_cookie",
|
||||
"value": "test_value",
|
||||
"domain": ".xiaohongshu.com",
|
||||
"path": "/",
|
||||
"expires": 1797066497.112584 # Playwright原生格式常常有浮点数
|
||||
}
|
||||
]
|
||||
try:
|
||||
result = _format_cookies(cookies_with_float_expires)
|
||||
print(f"✅ Playwright原生格式被完整保留")
|
||||
print(f" 原始expires: {cookies_with_float_expires[0]['expires']} (类型: {type(cookies_with_float_expires[0]['expires']).__name__})")
|
||||
print(f" 处理后expires: {result[0]['expires']} (类型: {type(result[0]['expires']).__name__})")
|
||||
assert result == cookies_with_float_expires, "Playwright原生格式应被完整保留"
|
||||
assert isinstance(result[0]['expires'], float), "expires浮点数应该被保留"
|
||||
except Exception as e:
|
||||
print(f"❌ 失败: {str(e)}")
|
||||
|
||||
# 测试11: Playwright原生格式中sameSite大小写(应被保留)
|
||||
print("\n测试 11: Playwright原生格式中sameSite(应被完整保留)")
|
||||
cookies_with_samesite = [
|
||||
{
|
||||
"name": "test_cookie1",
|
||||
"value": "test_value1",
|
||||
"domain": ".xiaohongshu.com",
|
||||
"path": "/",
|
||||
"sameSite": "Lax" # Playwright原生格式
|
||||
},
|
||||
{
|
||||
"name": "test_cookie2",
|
||||
"value": "test_value2",
|
||||
"domain": ".xiaohongshu.com",
|
||||
"path": "/",
|
||||
"sameSite": "Strict"
|
||||
}
|
||||
]
|
||||
try:
|
||||
result = _format_cookies(cookies_with_samesite)
|
||||
print(f"✅ Playwright原生格式被完整保留")
|
||||
print(f" cookie1 sameSite: {result[0]['sameSite']}")
|
||||
print(f" cookie2 sameSite: {result[1]['sameSite']}")
|
||||
assert result == cookies_with_samesite, "Playwright原生格式应被完整保留"
|
||||
assert result[0]['sameSite'] == 'Lax'
|
||||
assert result[1]['sameSite'] == 'Strict'
|
||||
except Exception as e:
|
||||
print(f"❌ 失败: {str(e)}")
|
||||
|
||||
print("\n" + "="*60)
|
||||
print("测试完成")
|
||||
print("="*60)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
test_format_cookies()
|
||||
31
backend/test_cookie_inject.bat
Normal file
31
backend/test_cookie_inject.bat
Normal file
@@ -0,0 +1,31 @@
|
||||
@echo off
|
||||
chcp 65001 >nul
|
||||
echo ========================================
|
||||
echo 小红书Cookie注入测试工具
|
||||
echo ========================================
|
||||
echo.
|
||||
echo 此工具使用Playwright真实注入Cookie
|
||||
echo 支持验证Cookie有效性并跳转到指定页面
|
||||
echo.
|
||||
echo ========================================
|
||||
echo.
|
||||
|
||||
cd /d %~dp0
|
||||
|
||||
REM 检查是否有cookies.json文件
|
||||
if exist cookies.json (
|
||||
echo 检测到 cookies.json 文件
|
||||
echo.
|
||||
python test_cookie_inject.py
|
||||
) else (
|
||||
echo 未找到 cookies.json 文件
|
||||
echo 请先准备Cookie文件或在程序中手动输入
|
||||
echo.
|
||||
python test_cookie_inject.py
|
||||
)
|
||||
|
||||
echo.
|
||||
echo ========================================
|
||||
echo 测试完成
|
||||
echo ========================================
|
||||
pause
|
||||
398
backend/test_cookie_inject.py
Normal file
398
backend/test_cookie_inject.py
Normal file
@@ -0,0 +1,398 @@
|
||||
"""
|
||||
Cookie注入测试脚本
|
||||
使用Playwright注入Cookie并验证其有效性
|
||||
支持跳转到创作者中心或小红书首页
|
||||
"""
|
||||
import asyncio
|
||||
import sys
|
||||
import json
|
||||
from pathlib import Path
|
||||
from playwright.async_api import async_playwright
|
||||
from typing import Optional, List, Dict, Any
|
||||
|
||||
|
||||
class CookieInjector:
|
||||
"""Cookie注入器"""
|
||||
|
||||
def __init__(self, headless: bool = False):
|
||||
"""
|
||||
初始化Cookie注入器
|
||||
|
||||
Args:
|
||||
headless: 是否使用无头模式,False可以看到浏览器界面
|
||||
"""
|
||||
self.headless = headless
|
||||
self.playwright = None
|
||||
self.browser = None
|
||||
self.context = None
|
||||
self.page = None
|
||||
|
||||
async def init_browser(self):
|
||||
"""初始化浏览器"""
|
||||
try:
|
||||
print("正在启动浏览器...")
|
||||
|
||||
# Windows环境下设置事件循环策略
|
||||
if sys.platform == 'win32':
|
||||
try:
|
||||
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
|
||||
except Exception as e:
|
||||
print(f"警告: 设置事件循环策略失败: {str(e)}")
|
||||
|
||||
self.playwright = await async_playwright().start()
|
||||
|
||||
# 启动浏览器
|
||||
self.browser = await self.playwright.chromium.launch(
|
||||
headless=self.headless,
|
||||
args=['--disable-blink-features=AutomationControlled']
|
||||
)
|
||||
|
||||
# 创建浏览器上下文
|
||||
self.context = await self.browser.new_context(
|
||||
viewport={'width': 1280, 'height': 720},
|
||||
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
||||
)
|
||||
|
||||
# 创建新页面
|
||||
self.page = await self.context.new_page()
|
||||
|
||||
print("浏览器初始化成功")
|
||||
|
||||
except Exception as e:
|
||||
print(f"浏览器初始化失败: {str(e)}")
|
||||
raise
|
||||
|
||||
async def inject_cookies(self, cookies: List[Dict[str, Any]]) -> bool:
|
||||
"""
|
||||
注入Cookie
|
||||
|
||||
Args:
|
||||
cookies: Cookie列表
|
||||
|
||||
Returns:
|
||||
是否注入成功
|
||||
"""
|
||||
try:
|
||||
if not self.context:
|
||||
await self.init_browser()
|
||||
|
||||
print(f"正在注入 {len(cookies)} 个Cookie...")
|
||||
|
||||
# 注入Cookie到浏览器上下文
|
||||
await self.context.add_cookies(cookies)
|
||||
|
||||
print("Cookie注入成功")
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
print(f"Cookie注入失败: {str(e)}")
|
||||
return False
|
||||
|
||||
async def verify_and_navigate(self, target_page: str = 'creator') -> Dict[str, Any]:
|
||||
"""
|
||||
验证Cookie并跳转到指定页面
|
||||
|
||||
Args:
|
||||
target_page: 目标页面类型 ('creator' 或 'home')
|
||||
|
||||
Returns:
|
||||
验证结果字典
|
||||
"""
|
||||
try:
|
||||
if not self.page:
|
||||
return {"success": False, "error": "浏览器未初始化"}
|
||||
|
||||
# 确定目标URL
|
||||
urls = {
|
||||
'creator': 'https://creator.xiaohongshu.com',
|
||||
'home': 'https://www.xiaohongshu.com'
|
||||
}
|
||||
target_url = urls.get(target_page, urls['creator'])
|
||||
page_name = '创作者中心' if target_page == 'creator' else '小红书首页'
|
||||
|
||||
print(f"\n正在访问{page_name}: {target_url}")
|
||||
|
||||
# 访问目标页面
|
||||
await self.page.goto(target_url, wait_until='networkidle', timeout=30000)
|
||||
await asyncio.sleep(2) # 等待页面完全加载
|
||||
|
||||
# 获取当前URL和标题
|
||||
current_url = self.page.url
|
||||
title = await self.page.title()
|
||||
|
||||
print(f"当前URL: {current_url}")
|
||||
print(f"页面标题: {title}")
|
||||
|
||||
# 检查是否被重定向到登录页
|
||||
is_logged_in = 'login' not in current_url.lower()
|
||||
|
||||
if is_logged_in:
|
||||
print("Cookie验证成功,已登录状态")
|
||||
|
||||
# 尝试获取用户信息
|
||||
try:
|
||||
# 等待用户相关元素出现(如头像、用户名等)
|
||||
await self.page.wait_for_selector('[class*="avatar"], [class*="user"]', timeout=5000)
|
||||
print("检测到用户信息元素,确认登录成功")
|
||||
except Exception:
|
||||
print("未检测到明显的用户信息元素,但未跳转到登录页")
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"message": f"Cookie有效,已成功访问{page_name}",
|
||||
"url": current_url,
|
||||
"title": title,
|
||||
"logged_in": True
|
||||
}
|
||||
else:
|
||||
print("Cookie可能已失效,页面跳转到登录页")
|
||||
return {
|
||||
"success": False,
|
||||
"error": "Cookie已失效或无效,页面跳转到登录页",
|
||||
"url": current_url,
|
||||
"title": title,
|
||||
"logged_in": False
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
print(f"验证过程异常: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return {
|
||||
"success": False,
|
||||
"error": f"验证过程异常: {str(e)}"
|
||||
}
|
||||
|
||||
async def keep_browser_open(self, duration: int = 60):
|
||||
"""
|
||||
保持浏览器打开一段时间,方便观察
|
||||
|
||||
Args:
|
||||
duration: 保持打开的秒数,0表示永久打开直到手动关闭
|
||||
"""
|
||||
try:
|
||||
if duration == 0:
|
||||
print("\n浏览器将保持打开,按 Ctrl+C 关闭...")
|
||||
try:
|
||||
while True:
|
||||
await asyncio.sleep(1)
|
||||
except KeyboardInterrupt:
|
||||
print("\n用户中断,准备关闭浏览器...")
|
||||
else:
|
||||
print(f"\n浏览器将保持打开 {duration} 秒...")
|
||||
await asyncio.sleep(duration)
|
||||
print("时间到,准备关闭浏览器...")
|
||||
except Exception as e:
|
||||
print(f"保持浏览器异常: {str(e)}")
|
||||
|
||||
async def close_browser(self):
|
||||
"""关闭浏览器"""
|
||||
try:
|
||||
print("\n正在关闭浏览器...")
|
||||
if self.page:
|
||||
await self.page.close()
|
||||
if self.context:
|
||||
await self.context.close()
|
||||
if self.browser:
|
||||
await self.browser.close()
|
||||
if self.playwright:
|
||||
await self.playwright.stop()
|
||||
print("浏览器已关闭")
|
||||
except Exception as e:
|
||||
print(f"关闭浏览器异常: {str(e)}")
|
||||
|
||||
|
||||
def load_cookies_from_file(file_path: str) -> Optional[List[Dict[str, Any]]]:
|
||||
"""
|
||||
从文件加载Cookie
|
||||
|
||||
Args:
|
||||
file_path: Cookie文件路径
|
||||
|
||||
Returns:
|
||||
Cookie列表,失败返回None
|
||||
"""
|
||||
try:
|
||||
cookie_file = Path(file_path)
|
||||
if not cookie_file.exists():
|
||||
print(f"Cookie文件不存在: {file_path}")
|
||||
return None
|
||||
|
||||
with open(cookie_file, 'r', encoding='utf-8') as f:
|
||||
cookies = json.load(f)
|
||||
|
||||
if not isinstance(cookies, list):
|
||||
print("Cookie格式错误:必须是数组")
|
||||
return None
|
||||
|
||||
if len(cookies) == 0:
|
||||
print("Cookie数组为空")
|
||||
return None
|
||||
|
||||
# 验证每个Cookie必须有name和value
|
||||
for cookie in cookies:
|
||||
if not cookie.get('name') or not cookie.get('value'):
|
||||
print(f"Cookie格式错误:缺少name或value字段")
|
||||
return None
|
||||
|
||||
print(f"成功加载 {len(cookies)} 个Cookie")
|
||||
return cookies
|
||||
|
||||
except json.JSONDecodeError as e:
|
||||
print(f"Cookie文件JSON解析失败: {str(e)}")
|
||||
return None
|
||||
except Exception as e:
|
||||
print(f"加载Cookie文件失败: {str(e)}")
|
||||
return None
|
||||
|
||||
|
||||
async def test_cookie_inject(
|
||||
cookies_source: str,
|
||||
target_page: str = 'creator',
|
||||
headless: bool = False,
|
||||
keep_open: int = 0
|
||||
):
|
||||
"""
|
||||
测试Cookie注入
|
||||
|
||||
Args:
|
||||
cookies_source: Cookie来源(文件路径或JSON字符串)
|
||||
target_page: 目标页面 ('creator' 或 'home')
|
||||
headless: 是否使用无头模式
|
||||
keep_open: 保持浏览器打开的秒数(0表示永久打开)
|
||||
"""
|
||||
print("="*60)
|
||||
print("Cookie注入并验证测试")
|
||||
print("="*60)
|
||||
|
||||
# 加载Cookie
|
||||
cookies = None
|
||||
|
||||
# 尝试作为文件路径加载
|
||||
if Path(cookies_source).exists():
|
||||
print(f"\n从文件加载Cookie: {cookies_source}")
|
||||
cookies = load_cookies_from_file(cookies_source)
|
||||
else:
|
||||
# 尝试作为JSON字符串解析
|
||||
try:
|
||||
print("\n尝试解析Cookie JSON字符串...")
|
||||
cookies = json.loads(cookies_source)
|
||||
if isinstance(cookies, list) and len(cookies) > 0:
|
||||
print(f"成功解析 {len(cookies)} 个Cookie")
|
||||
except Exception as e:
|
||||
print(f"Cookie解析失败: {str(e)}")
|
||||
|
||||
if not cookies:
|
||||
print("\n加载Cookie失败,请检查输入")
|
||||
return
|
||||
|
||||
# 创建注入器
|
||||
injector = CookieInjector(headless=headless)
|
||||
|
||||
try:
|
||||
# 初始化浏览器
|
||||
await injector.init_browser()
|
||||
|
||||
# 注入Cookie
|
||||
inject_success = await injector.inject_cookies(cookies)
|
||||
|
||||
if not inject_success:
|
||||
print("\nCookie注入失败")
|
||||
return
|
||||
|
||||
# 验证并跳转
|
||||
result = await injector.verify_and_navigate(target_page)
|
||||
|
||||
print("\n" + "="*60)
|
||||
print("验证结果")
|
||||
print("="*60)
|
||||
|
||||
if result.get('success'):
|
||||
print(f"状态: 成功")
|
||||
print(f"消息: {result.get('message')}")
|
||||
print(f"URL: {result.get('url')}")
|
||||
print(f"标题: {result.get('title')}")
|
||||
print(f"登录状态: {'已登录' if result.get('logged_in') else '未登录'}")
|
||||
else:
|
||||
print(f"状态: 失败")
|
||||
print(f"错误: {result.get('error')}")
|
||||
if result.get('url'):
|
||||
print(f"当前URL: {result.get('url')}")
|
||||
|
||||
# 保持浏览器打开
|
||||
if keep_open >= 0:
|
||||
await injector.keep_browser_open(keep_open)
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("\n\n用户中断测试")
|
||||
except Exception as e:
|
||||
print(f"\n测试过程异常: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
finally:
|
||||
await injector.close_browser()
|
||||
|
||||
print("\n" + "="*60)
|
||||
print("测试完成")
|
||||
print("="*60)
|
||||
|
||||
|
||||
async def main():
|
||||
"""主函数"""
|
||||
print("="*60)
|
||||
print("小红书Cookie注入测试工具")
|
||||
print("="*60)
|
||||
|
||||
print("\n功能说明:")
|
||||
print("1. 注入Cookie到浏览器")
|
||||
print("2. 验证Cookie有效性")
|
||||
print("3. 跳转到指定页面(创作者中心/小红书首页)")
|
||||
|
||||
print("\n" + "="*60)
|
||||
|
||||
# 输入Cookie来源
|
||||
print("\n请输入Cookie来源:")
|
||||
print("1. 输入Cookie文件路径(如: cookies.json)")
|
||||
print("2. 直接粘贴JSON格式的Cookie")
|
||||
|
||||
cookies_source = input("\nCookie来源: ").strip()
|
||||
|
||||
if not cookies_source:
|
||||
print("Cookie来源不能为空")
|
||||
return
|
||||
|
||||
# 选择目标页面
|
||||
print("\n请选择目标页面:")
|
||||
print("1. 创作者中心(creator.xiaohongshu.com)")
|
||||
print("2. 小红书首页(www.xiaohongshu.com)")
|
||||
|
||||
page_choice = input("\n选择 (1 或 2, 默认为 1): ").strip()
|
||||
target_page = 'home' if page_choice == '2' else 'creator'
|
||||
|
||||
# 选择浏览器模式
|
||||
headless_choice = input("\n是否使用无头模式?(y/n, 默认为 n): ").strip().lower()
|
||||
headless = headless_choice == 'y'
|
||||
|
||||
# 选择保持打开时间
|
||||
keep_open_input = input("\n保持浏览器打开时间(秒,0表示直到手动关闭,默认60): ").strip()
|
||||
try:
|
||||
keep_open = int(keep_open_input) if keep_open_input else 60
|
||||
except ValueError:
|
||||
keep_open = 60
|
||||
|
||||
# 执行测试
|
||||
await test_cookie_inject(
|
||||
cookies_source=cookies_source,
|
||||
target_page=target_page,
|
||||
headless=headless,
|
||||
keep_open=keep_open
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Windows环境下设置事件循环策略
|
||||
if sys.platform == 'win32':
|
||||
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
|
||||
|
||||
# 运行测试
|
||||
asyncio.run(main())
|
||||
207
backend/test_damai_proxy.py
Normal file
207
backend/test_damai_proxy.py
Normal file
@@ -0,0 +1,207 @@
|
||||
"""
|
||||
大麦固定代理IP测试脚本
|
||||
测试两个固定代理IP在无头浏览器中的可用性
|
||||
"""
|
||||
import asyncio
|
||||
import sys
|
||||
from playwright.async_api import async_playwright
|
||||
|
||||
|
||||
# 大麦固定代理IP配置
|
||||
DAMAI_PROXIES = [
|
||||
{
|
||||
"name": "大麦代理1",
|
||||
"server": "http://36.137.177.131:50001",
|
||||
"username": "qqwvy0",
|
||||
"password": "mun3r7xz"
|
||||
},
|
||||
{
|
||||
"name": "大麦代理2",
|
||||
"server": "http://111.132.40.72:50002",
|
||||
"username": "ih3z07",
|
||||
"password": "078bt7o5"
|
||||
}
|
||||
]
|
||||
|
||||
|
||||
async def test_proxy(proxy_config: dict):
|
||||
"""
|
||||
测试单个代理IP
|
||||
|
||||
Args:
|
||||
proxy_config: 代理配置字典
|
||||
"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"🔍 开始测试: {proxy_config['name']}")
|
||||
print(f" 代理服务器: {proxy_config['server']}")
|
||||
print(f" 认证信息: {proxy_config['username']} / {proxy_config['password']}")
|
||||
print(f"{'='*60}")
|
||||
|
||||
playwright = None
|
||||
browser = None
|
||||
|
||||
try:
|
||||
# 启动Playwright
|
||||
playwright = await async_playwright().start()
|
||||
print("✅ Playwright启动成功")
|
||||
|
||||
# 配置代理
|
||||
proxy_settings = {
|
||||
"server": proxy_config["server"],
|
||||
"username": proxy_config["username"],
|
||||
"password": proxy_config["password"]
|
||||
}
|
||||
|
||||
# 启动浏览器(带代理)
|
||||
print(f"🚀 正在启动浏览器(使用代理: {proxy_config['server']})...")
|
||||
browser = await playwright.chromium.launch(
|
||||
headless=True,
|
||||
proxy=proxy_settings,
|
||||
args=[
|
||||
'--disable-blink-features=AutomationControlled',
|
||||
'--no-sandbox',
|
||||
'--disable-setuid-sandbox',
|
||||
'--disable-dev-shm-usage',
|
||||
'--disable-web-security',
|
||||
'--disable-features=IsolateOrigins,site-per-process',
|
||||
]
|
||||
)
|
||||
print("✅ 浏览器启动成功")
|
||||
|
||||
# 创建上下文
|
||||
context = await browser.new_context(
|
||||
viewport={'width': 1280, 'height': 720},
|
||||
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
||||
)
|
||||
print("✅ 浏览器上下文创建成功")
|
||||
|
||||
# 创建页面
|
||||
page = await context.new_page()
|
||||
print("✅ 页面创建成功")
|
||||
|
||||
# 测试1: 访问IP检测网站(检查代理IP是否生效)
|
||||
print("\n📍 测试1: 访问IP检测网站...")
|
||||
try:
|
||||
await page.goto("http://httpbin.org/ip", timeout=30000)
|
||||
await asyncio.sleep(2)
|
||||
|
||||
# 获取页面内容
|
||||
content = await page.content()
|
||||
print("✅ 访问成功,页面内容:")
|
||||
print(content[:500]) # 只显示前500字符
|
||||
|
||||
# 尝试提取IP信息
|
||||
ip_info = await page.evaluate("() => document.body.innerText")
|
||||
print(f"\n🌐 当前IP信息:\n{ip_info}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 测试1失败: {str(e)}")
|
||||
|
||||
# 测试2: 访问小红书登录页(检查代理在实际场景中是否可用)
|
||||
print("\n📍 测试2: 访问小红书登录页...")
|
||||
try:
|
||||
await page.goto("https://creator.xiaohongshu.com/login", timeout=30000)
|
||||
await asyncio.sleep(3)
|
||||
|
||||
title = await page.title()
|
||||
url = page.url
|
||||
print(f"✅ 访问成功")
|
||||
print(f" 页面标题: {title}")
|
||||
print(f" 当前URL: {url}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 测试2失败: {str(e)}")
|
||||
|
||||
# 测试3: 访问大麦网(测试目标网站)
|
||||
print("\n📍 测试3: 访问大麦网...")
|
||||
try:
|
||||
await page.goto("https://www.damai.cn/", timeout=30000)
|
||||
await asyncio.sleep(3)
|
||||
|
||||
title = await page.title()
|
||||
url = page.url
|
||||
print(f"✅ 访问成功")
|
||||
print(f" 页面标题: {title}")
|
||||
print(f" 当前URL: {url}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 测试3失败: {str(e)}")
|
||||
|
||||
print(f"\n✅ {proxy_config['name']} 测试完成")
|
||||
|
||||
except Exception as e:
|
||||
print(f"\n❌ {proxy_config['name']} 测试失败: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
finally:
|
||||
# 清理资源
|
||||
try:
|
||||
if browser:
|
||||
await browser.close()
|
||||
print("🧹 浏览器已关闭")
|
||||
if playwright:
|
||||
await playwright.stop()
|
||||
print("🧹 Playwright已停止")
|
||||
except Exception as e:
|
||||
print(f"⚠️ 清理资源时出错: {str(e)}")
|
||||
|
||||
|
||||
async def test_all_proxies():
|
||||
"""测试所有代理IP"""
|
||||
print("\n" + "="*60)
|
||||
print("🎯 大麦固定代理IP测试")
|
||||
print("="*60)
|
||||
print(f"📊 共配置 {len(DAMAI_PROXIES)} 个代理IP")
|
||||
|
||||
# 依次测试每个代理
|
||||
for i, proxy_config in enumerate(DAMAI_PROXIES, 1):
|
||||
print(f"\n\n{'#'*60}")
|
||||
print(f"# 测试进度: {i}/{len(DAMAI_PROXIES)}")
|
||||
print(f"{'#'*60}")
|
||||
|
||||
await test_proxy(proxy_config)
|
||||
|
||||
# 测试间隔
|
||||
if i < len(DAMAI_PROXIES):
|
||||
print(f"\n⏳ 等待5秒后测试下一个代理...")
|
||||
await asyncio.sleep(5)
|
||||
|
||||
print("\n" + "="*60)
|
||||
print("🎉 所有代理测试完成!")
|
||||
print("="*60)
|
||||
|
||||
|
||||
async def test_single_proxy(index: int = 0):
|
||||
"""
|
||||
测试单个代理IP
|
||||
|
||||
Args:
|
||||
index: 代理索引(0或1)
|
||||
"""
|
||||
if index < 0 or index >= len(DAMAI_PROXIES):
|
||||
print(f"❌ 无效的代理索引: {index},请使用 0 或 1")
|
||||
return
|
||||
|
||||
await test_proxy(DAMAI_PROXIES[index])
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Windows环境下设置事件循环策略
|
||||
if sys.platform == 'win32':
|
||||
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
|
||||
|
||||
# 解析命令行参数
|
||||
if len(sys.argv) > 1:
|
||||
try:
|
||||
proxy_index = int(sys.argv[1])
|
||||
print(f"🎯 测试单个代理(索引: {proxy_index})")
|
||||
asyncio.run(test_single_proxy(proxy_index))
|
||||
except ValueError:
|
||||
print("❌ 参数错误,请使用: python test_damai_proxy.py [0|1]")
|
||||
print(" 0: 测试代理1")
|
||||
print(" 1: 测试代理2")
|
||||
print(" 不带参数: 测试所有代理")
|
||||
else:
|
||||
# 测试所有代理
|
||||
asyncio.run(test_all_proxies())
|
||||
282
backend/test_headless_comparison.py
Normal file
282
backend/test_headless_comparison.py
Normal file
@@ -0,0 +1,282 @@
|
||||
"""
|
||||
对比测试有头模式和无头模式的页面获取情况
|
||||
"""
|
||||
import asyncio
|
||||
from playwright.async_api import async_playwright
|
||||
import sys
|
||||
|
||||
|
||||
async def test_headless_comparison(proxy_index: int = 0):
|
||||
"""对比测试有头模式和无头模式"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"🔍 对比测试有头模式 vs 无头模式")
|
||||
print(f"{'='*60}")
|
||||
|
||||
# 从代理配置获取代理信息
|
||||
from damai_proxy_config import get_proxy_config
|
||||
proxy_config = get_proxy_config(proxy_index)
|
||||
proxy_server = proxy_config['server'].replace('http://', '')
|
||||
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
|
||||
|
||||
print(f"✅ 使用代理: 代理{proxy_index + 1}")
|
||||
print(f" 代理服务器: {proxy_config['server']}")
|
||||
|
||||
# 配置代理对象
|
||||
proxy_parts = proxy_url.replace('http://', '').replace('https://', '').split('@')
|
||||
if len(proxy_parts) == 2:
|
||||
auth_part = proxy_parts[0]
|
||||
server_part = proxy_parts[1]
|
||||
username, password = auth_part.split(':')
|
||||
|
||||
proxy_config_obj = {
|
||||
"server": f"http://{server_part}",
|
||||
"username": username,
|
||||
"password": password
|
||||
}
|
||||
else:
|
||||
proxy_config_obj = {"server": proxy_url}
|
||||
|
||||
print(f" 配置的代理对象: {proxy_config_obj}")
|
||||
|
||||
# 测试无头模式
|
||||
print(f"\n🧪 测试 1/2: 无头模式 (headless=True)")
|
||||
await test_single_mode(True, proxy_config_obj)
|
||||
|
||||
print(f"\n🧪 测试 2/2: 有头模式 (headless=False)")
|
||||
await test_single_mode(False, proxy_config_obj)
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print("✅ 对比测试完成!")
|
||||
print("="*60)
|
||||
|
||||
|
||||
async def test_single_mode(headless: bool, proxy_config_obj: dict):
|
||||
"""测试单个模式"""
|
||||
mode_name = "无头模式" if headless else "有头模式"
|
||||
print(f" 正在启动浏览器 ({mode_name})...")
|
||||
|
||||
try:
|
||||
async with async_playwright() as p:
|
||||
# 启动浏览器
|
||||
browser = await p.chromium.launch(
|
||||
headless=headless,
|
||||
proxy=proxy_config_obj,
|
||||
# 添加一些额外参数以提高稳定性
|
||||
args=[
|
||||
'--no-sandbox',
|
||||
'--disable-setuid-sandbox',
|
||||
'--disable-dev-shm-usage',
|
||||
'--disable-blink-features=AutomationControlled',
|
||||
]
|
||||
)
|
||||
|
||||
# 创建上下文
|
||||
context = await browser.new_context(
|
||||
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
|
||||
viewport={'width': 1280, 'height': 720}
|
||||
)
|
||||
|
||||
# 创建页面
|
||||
page = await context.new_page()
|
||||
|
||||
# 访问小红书登录页面
|
||||
print(f" 访问小红书登录页...")
|
||||
try:
|
||||
# 使用不同的wait_until策略
|
||||
await page.goto('https://creator.xiaohongshu.com/login',
|
||||
wait_until='domcontentloaded',
|
||||
timeout=15000)
|
||||
|
||||
# 等待一段时间让页面内容加载
|
||||
await asyncio.sleep(3)
|
||||
|
||||
# 获取页面信息
|
||||
title = await page.title()
|
||||
url = page.url
|
||||
content = await page.content()
|
||||
content_len = len(content)
|
||||
|
||||
print(f" ✅ {mode_name} - 访问成功")
|
||||
print(f" 标题: {title}")
|
||||
print(f" URL: {url}")
|
||||
print(f" 内容长度: {content_len} 字符")
|
||||
|
||||
# 检查关键元素
|
||||
phone_input = await page.query_selector('input[placeholder="手机号"]')
|
||||
if phone_input:
|
||||
print(f" ✅ 找到手机号输入框")
|
||||
else:
|
||||
print(f" ❌ 未找到手机号输入框")
|
||||
|
||||
# 查找所有input元素
|
||||
inputs = await page.query_selector_all('input')
|
||||
print(f" 找到 {len(inputs)} 个input元素")
|
||||
|
||||
if content_len == 0:
|
||||
print(f" ⚠️ 页面内容为空")
|
||||
elif "验证" in content or "captcha" in content.lower() or "安全" in content:
|
||||
print(f" ⚠️ 检测到验证或安全提示")
|
||||
else:
|
||||
print(f" ✅ 页面内容正常")
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ {mode_name} - 访问失败: {str(e)}")
|
||||
|
||||
await browser.close()
|
||||
print(f" 🔄 {mode_name} 浏览器已关闭")
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ {mode_name} - 测试异常: {str(e)}")
|
||||
|
||||
|
||||
async def test_with_different_wait_strategies(proxy_index: int = 0):
|
||||
"""测试不同的页面等待策略"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"🔍 测试不同页面等待策略")
|
||||
print(f"{'='*60}")
|
||||
|
||||
from damai_proxy_config import get_proxy_config
|
||||
proxy_config = get_proxy_config(proxy_index)
|
||||
proxy_server = proxy_config['server'].replace('http://', '')
|
||||
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
|
||||
|
||||
proxy_parts = proxy_url.replace('http://', '').replace('https://', '').split('@')
|
||||
if len(proxy_parts) == 2:
|
||||
auth_part = proxy_parts[0]
|
||||
server_part = proxy_parts[1]
|
||||
username, password = auth_part.split(':')
|
||||
|
||||
proxy_config_obj = {
|
||||
"server": f"http://{server_part}",
|
||||
"username": username,
|
||||
"password": password
|
||||
}
|
||||
else:
|
||||
proxy_config_obj = {"server": proxy_url}
|
||||
|
||||
wait_strategies = [
|
||||
('domcontentloaded', 'DOM内容加载完成'),
|
||||
('load', '页面完全加载'),
|
||||
('networkidle', '网络空闲'),
|
||||
('commit', '导航提交')
|
||||
]
|
||||
|
||||
for wait_strategy, description in wait_strategies:
|
||||
print(f"\n🧪 测试等待策略: {description} ({wait_strategy})")
|
||||
|
||||
try:
|
||||
async with async_playwright() as p:
|
||||
browser = await p.chromium.launch(
|
||||
headless=True, # 使用无头模式进行测试
|
||||
proxy=proxy_config_obj
|
||||
)
|
||||
|
||||
context = await browser.new_context(
|
||||
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
||||
)
|
||||
|
||||
page = await context.new_page()
|
||||
|
||||
try:
|
||||
print(f" 访问小红书登录页 (wait_until='{wait_strategy}')...")
|
||||
await page.goto('https://creator.xiaohongshu.com/login',
|
||||
wait_until=wait_strategy,
|
||||
timeout=15000)
|
||||
|
||||
# 额外等待时间
|
||||
await asyncio.sleep(2)
|
||||
|
||||
content = await page.content()
|
||||
content_len = len(content)
|
||||
|
||||
print(f" ✅ 访问成功")
|
||||
print(f" 内容长度: {content_len} 字符")
|
||||
|
||||
# 检查手机号输入框
|
||||
phone_input = await page.query_selector('input[placeholder="手机号"]')
|
||||
if phone_input:
|
||||
print(f" ✅ 找到手机号输入框")
|
||||
else:
|
||||
print(f" ❌ 未找到手机号输入框")
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ 访问失败: {str(e)}")
|
||||
|
||||
await browser.close()
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ 测试异常: {str(e)}")
|
||||
|
||||
|
||||
def explain_page_loading_factors():
|
||||
"""解释影响页面加载的因素"""
|
||||
print("="*60)
|
||||
print("💡 影响页面加载的因素")
|
||||
print("="*60)
|
||||
|
||||
print("\n1. 浏览器模式差异:")
|
||||
print(" • 有头模式: 浏览器界面可见,渲染更完整")
|
||||
print(" • 无头模式: 后台运行,可能加载策略略有不同")
|
||||
|
||||
print("\n2. 页面等待策略:")
|
||||
print(" • domcontentloaded: DOM构建完成(推荐)")
|
||||
print(" • load: 所有资源加载完成")
|
||||
print(" • networkidle: 网络空闲(可能等待较长时间)")
|
||||
|
||||
print("\n3. 反检测措施:")
|
||||
print(" • 浏览器指纹混淆")
|
||||
print(" • User-Agent设置")
|
||||
print(" • 禁用webdriver属性")
|
||||
|
||||
print("\n4. 网络因素:")
|
||||
print(" • 代理IP质量")
|
||||
print(" • 网络延迟")
|
||||
print(" • 目标网站反爬虫机制")
|
||||
|
||||
|
||||
async def main():
|
||||
"""主函数"""
|
||||
explain_page_loading_factors()
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print("🎯 选择测试模式")
|
||||
print("="*60)
|
||||
|
||||
print("\n1. 有头模式 vs 无头模式对比测试")
|
||||
print("2. 不同页面等待策略测试")
|
||||
|
||||
try:
|
||||
choice = input("\n请选择测试模式 (1-2, 默认为1): ").strip()
|
||||
|
||||
if choice not in ['1', '2']:
|
||||
choice = '1'
|
||||
|
||||
proxy_choice = input("请选择代理 (0 或 1, 默认为0): ").strip()
|
||||
if proxy_choice not in ['0', '1']:
|
||||
proxy_choice = '0'
|
||||
proxy_idx = int(proxy_choice)
|
||||
|
||||
if choice == '1':
|
||||
await test_headless_comparison(proxy_idx)
|
||||
elif choice == '2':
|
||||
await test_with_different_wait_strategies(proxy_idx)
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print("✅ 测试完成!")
|
||||
print("="*60)
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("\n\n⚠️ 测试被用户中断")
|
||||
except Exception as e:
|
||||
print(f"\n❌ 测试过程中出现错误: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Windows环境下设置事件循环策略
|
||||
if sys.platform == 'win32':
|
||||
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
|
||||
|
||||
# 运行测试
|
||||
asyncio.run(main())
|
||||
356
backend/test_headless_mode.py
Normal file
356
backend/test_headless_mode.py
Normal file
@@ -0,0 +1,356 @@
|
||||
"""
|
||||
使用代理并开启有头模式的示例
|
||||
展示如何在使用代理的同时开启浏览器界面
|
||||
"""
|
||||
import asyncio
|
||||
from playwright.async_api import async_playwright
|
||||
import sys
|
||||
|
||||
|
||||
async def test_proxy_with_headless_false(proxy_index: int = 0):
|
||||
"""使用代理并开启有头模式测试"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"🔍 测试代理 + 有头模式")
|
||||
print(f"{'='*60}")
|
||||
|
||||
# 从代理配置获取代理信息
|
||||
from damai_proxy_config import get_proxy_config
|
||||
proxy_config = get_proxy_config(proxy_index)
|
||||
proxy_server = proxy_config['server'].replace('http://', '')
|
||||
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
|
||||
|
||||
print(f"✅ 使用代理: 代理{proxy_index + 1}")
|
||||
print(f" 代理服务器: {proxy_config['server']}")
|
||||
print(f" 有头模式: 开启")
|
||||
|
||||
try:
|
||||
async with async_playwright() as p:
|
||||
# 配置代理
|
||||
proxy_parts = proxy_url.replace('http://', '').replace('https://', '').split('@')
|
||||
if len(proxy_parts) == 2:
|
||||
auth_part = proxy_parts[0]
|
||||
server_part = proxy_parts[1]
|
||||
username, password = auth_part.split(':')
|
||||
|
||||
proxy_config_obj = {
|
||||
"server": f"http://{server_part}",
|
||||
"username": username,
|
||||
"password": password
|
||||
}
|
||||
else:
|
||||
proxy_config_obj = {"server": proxy_url}
|
||||
|
||||
print(f" 配置的代理对象: {proxy_config_obj}")
|
||||
|
||||
# 启动浏览器 - 使用有头模式
|
||||
browser = await p.chromium.launch(
|
||||
headless=False, # 有头模式,可以看到浏览器界面
|
||||
proxy=proxy_config_obj
|
||||
)
|
||||
|
||||
# 创建上下文
|
||||
context = await browser.new_context(
|
||||
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
||||
)
|
||||
|
||||
# 创建页面
|
||||
page = await context.new_page()
|
||||
|
||||
print(f"\n🌐 访问百度测试代理连接...")
|
||||
try:
|
||||
await page.goto('https://www.baidu.com', wait_until='networkidle', timeout=15000)
|
||||
await asyncio.sleep(2)
|
||||
|
||||
title = await page.title()
|
||||
url = page.url
|
||||
print(f" ✅ 百度访问成功")
|
||||
print(f" 标题: {title}")
|
||||
print(f" URL: {url}")
|
||||
except Exception as e:
|
||||
print(f" ❌ 百度访问失败: {str(e)}")
|
||||
|
||||
print(f"\n🌐 访问小红书创作者平台...")
|
||||
try:
|
||||
await page.goto('https://creator.xiaohongshu.com/login', wait_until='networkidle', timeout=15000)
|
||||
await asyncio.sleep(3)
|
||||
|
||||
title = await page.title()
|
||||
url = page.url
|
||||
content_len = len(await page.content())
|
||||
|
||||
print(f" 访问结果:")
|
||||
print(f" 标题: {title}")
|
||||
print(f" URL: {url}")
|
||||
print(f" 内容长度: {content_len} 字符")
|
||||
|
||||
if content_len == 0:
|
||||
print(f" ⚠️ 页面内容为空")
|
||||
else:
|
||||
print(f" ✅ 页面加载成功")
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ 小红书访问失败: {str(e)}")
|
||||
|
||||
print(f"\n⏸️ 浏览器保持打开状态,您可以观察页面")
|
||||
print(f" 代理正在生效,您可以看到浏览器界面")
|
||||
print(f" 按 Enter 键关闭浏览器...")
|
||||
|
||||
# 等待用户输入
|
||||
input()
|
||||
|
||||
await browser.close()
|
||||
print(f"✅ 浏览器已关闭")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 测试过程异常: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
|
||||
async def test_xhs_login_with_headless_false(phone: str, proxy_index: int = 0):
|
||||
"""
|
||||
使用有头模式测试小红书登录流程
|
||||
|
||||
Args:
|
||||
phone: 手机号
|
||||
proxy_index: 代理索引 (0 或 1)
|
||||
"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"📱 使用有头模式测试小红书登录")
|
||||
print(f"{'='*60}")
|
||||
|
||||
# 从代理配置获取代理信息
|
||||
from damai_proxy_config import get_proxy_config
|
||||
proxy_config = get_proxy_config(proxy_index)
|
||||
proxy_server = proxy_config['server'].replace('http://', '')
|
||||
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
|
||||
|
||||
print(f"✅ 使用代理: 代理{proxy_index + 1}")
|
||||
print(f" 代理服务器: {proxy_config['server']}")
|
||||
print(f" 手机号: {phone}")
|
||||
print(f" 有头模式: 开启")
|
||||
|
||||
# 创建登录服务,使用有头模式
|
||||
from xhs_login import XHSLoginService
|
||||
login_service = XHSLoginService(use_pool=False) # 不使用池,便于调试
|
||||
|
||||
try:
|
||||
# 初始化浏览器(使用代理 + 有头模式)
|
||||
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
||||
|
||||
# 注意:XHSLoginService 内部使用了浏览器池模式,我们先看看如何修改它来支持有头模式
|
||||
print(" 正在启动浏览器(使用代理 + 有头模式)...")
|
||||
|
||||
# 直接使用Playwright创建有头模式的浏览器
|
||||
async with async_playwright() as p:
|
||||
# 配置代理
|
||||
proxy_parts = proxy_url.replace('http://', '').replace('https://', '').split('@')
|
||||
if len(proxy_parts) == 2:
|
||||
auth_part = proxy_parts[0]
|
||||
server_part = proxy_parts[1]
|
||||
username, password = auth_part.split(':')
|
||||
|
||||
proxy_config_obj = {
|
||||
"server": f"http://{server_part}",
|
||||
"username": username,
|
||||
"password": password
|
||||
}
|
||||
else:
|
||||
proxy_config_obj = {"server": proxy_url}
|
||||
|
||||
# 启动浏览器 - 有头模式
|
||||
browser = await p.chromium.launch(
|
||||
headless=False, # 有头模式
|
||||
proxy=proxy_config_obj
|
||||
)
|
||||
|
||||
context = await browser.new_context(
|
||||
user_agent=user_agent,
|
||||
viewport={'width': 1280, 'height': 720}
|
||||
)
|
||||
|
||||
page = await context.new_page()
|
||||
|
||||
print("✅ 浏览器启动成功(有头模式 + 代理)")
|
||||
|
||||
# 访问小红书登录页面
|
||||
print(f"\n🌐 访问小红书创作者平台登录页...")
|
||||
await page.goto('https://creator.xiaohongshu.com/login', wait_until='networkidle', timeout=30000)
|
||||
await asyncio.sleep(2)
|
||||
|
||||
print(f"✅ 进入登录页面")
|
||||
print(f" 当前URL: {page.url}")
|
||||
|
||||
# 查找手机号输入框
|
||||
print(f"\n🔍 查找手机号输入框...")
|
||||
try:
|
||||
# 尝试多种选择器
|
||||
phone_input_selectors = [
|
||||
'input[placeholder="手机号"]',
|
||||
'input[placeholder*="手机"]',
|
||||
'input[type="tel"]',
|
||||
'input[type="text"]'
|
||||
]
|
||||
|
||||
phone_input = None
|
||||
for selector in phone_input_selectors:
|
||||
try:
|
||||
phone_input = await page.wait_for_selector(selector, timeout=3000)
|
||||
if phone_input:
|
||||
print(f" ✅ 找到手机号输入框: {selector}")
|
||||
break
|
||||
except:
|
||||
continue
|
||||
|
||||
if phone_input:
|
||||
# 输入手机号
|
||||
await phone_input.fill(phone)
|
||||
print(f" ✅ 已输入手机号: {phone}")
|
||||
|
||||
# 等待界面更新
|
||||
await asyncio.sleep(1)
|
||||
|
||||
# 查找发送验证码按钮
|
||||
print(f"\n🔍 查找发送验证码按钮...")
|
||||
code_button_selectors = [
|
||||
'text="发送验证码"',
|
||||
'text="获取验证码"',
|
||||
'button:has-text("验证码")',
|
||||
'button:has-text("发送")',
|
||||
'div:has-text("验证码")'
|
||||
]
|
||||
|
||||
code_button = None
|
||||
for selector in code_button_selectors:
|
||||
try:
|
||||
code_button = await page.wait_for_selector(selector, timeout=3000)
|
||||
if code_button:
|
||||
print(f" ✅ 找到验证码按钮: {selector}")
|
||||
break
|
||||
except:
|
||||
continue
|
||||
|
||||
if code_button:
|
||||
print(f"\nℹ️ 已找到手机号输入框和验证码按钮")
|
||||
print(f" 您可以在浏览器中手动点击发送验证码")
|
||||
print(f" 验证码将发送到: {phone}")
|
||||
|
||||
print(f"\n⏸️ 浏览器保持打开状态,您可以手动操作")
|
||||
print(f" 按 Enter 键关闭浏览器...")
|
||||
input()
|
||||
else:
|
||||
print(f" ❌ 未找到发送验证码按钮")
|
||||
else:
|
||||
print(f" ❌ 未找到手机号输入框")
|
||||
print(f"\n📄 页面上可用的输入框:")
|
||||
inputs = await page.query_selector_all('input')
|
||||
for i, inp in enumerate(inputs):
|
||||
try:
|
||||
placeholder = await inp.get_attribute('placeholder')
|
||||
input_type = await inp.get_attribute('type')
|
||||
print(f" 输入框 {i+1}: type={input_type}, placeholder={placeholder}")
|
||||
except:
|
||||
continue
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ 操作失败: {str(e)}")
|
||||
|
||||
# 保持浏览器打开供用户观察
|
||||
print(f"\n⏸️ 浏览器保持打开状态,您可以观察页面元素")
|
||||
print(f" 按 Enter 键关闭浏览器...")
|
||||
input()
|
||||
|
||||
await browser.close()
|
||||
print(f"✅ 浏览器已关闭")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 测试过程异常: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
|
||||
def show_headless_comparison():
|
||||
"""显示有头模式和无头模式的对比"""
|
||||
print("="*60)
|
||||
print("💡 有头模式 vs 无头模式对比")
|
||||
print("="*60)
|
||||
|
||||
print("\n有头模式 (headless=False):")
|
||||
print(" ✅ 优点:")
|
||||
print(" • 可以看到浏览器界面,便于调试")
|
||||
print(" • 可以观察页面加载过程")
|
||||
print(" • 可以手动与页面交互")
|
||||
print(" • 有助于识别页面元素选择器")
|
||||
print("")
|
||||
print(" ❌ 缺点:")
|
||||
print(" • 占用屏幕空间")
|
||||
print(" • 可能影响用户其他操作")
|
||||
print(" • 资源消耗稍大")
|
||||
|
||||
print("\n无头模式 (headless=True):")
|
||||
print(" ✅ 优点:")
|
||||
print(" • 不显示浏览器界面,后台运行")
|
||||
print(" • 资源消耗较少")
|
||||
print(" • 适合自动化任务")
|
||||
print(" • 可以在服务器环境运行")
|
||||
print("")
|
||||
print(" ❌ 缺点:")
|
||||
print(" • 无法直观看到页面")
|
||||
print(" • 调试相对困难")
|
||||
|
||||
print("\n🎯 使用建议:")
|
||||
print(" • 开发调试时使用有头模式")
|
||||
print(" • 生产环境使用无头模式")
|
||||
print(" • 代理配置在两种模式下都有效")
|
||||
|
||||
|
||||
async def main():
|
||||
"""主函数"""
|
||||
show_headless_comparison()
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print("🎯 选择测试模式")
|
||||
print("="*60)
|
||||
|
||||
print("\n1. 基础代理 + 有头模式测试")
|
||||
print("2. 小红书登录 + 有头模式测试")
|
||||
|
||||
try:
|
||||
choice = input("\n请选择测试模式 (1-2, 默认为1): ").strip()
|
||||
|
||||
if choice not in ['1', '2']:
|
||||
choice = '1'
|
||||
|
||||
proxy_choice = input("请选择代理 (0 或 1, 默认为0): ").strip()
|
||||
if proxy_choice not in ['0', '1']:
|
||||
proxy_choice = '0'
|
||||
proxy_idx = int(proxy_choice)
|
||||
|
||||
if choice == '1':
|
||||
await test_proxy_with_headless_false(proxy_idx)
|
||||
elif choice == '2':
|
||||
phone = input("请输入手机号: ").strip()
|
||||
if not phone:
|
||||
print("❌ 手机号不能为空")
|
||||
return
|
||||
await test_xhs_login_with_headless_false(phone, proxy_idx)
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print("✅ 测试完成!")
|
||||
print("="*60)
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("\n\n⚠️ 测试被用户中断")
|
||||
except Exception as e:
|
||||
print(f"\n❌ 测试过程中出现错误: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Windows环境下设置事件循环策略
|
||||
if sys.platform == 'win32':
|
||||
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
|
||||
|
||||
# 运行测试
|
||||
asyncio.run(main())
|
||||
261
backend/test_login_flow.py
Normal file
261
backend/test_login_flow.py
Normal file
@@ -0,0 +1,261 @@
|
||||
"""
|
||||
小红书验证码登录流程测试脚本
|
||||
测试完整的验证码发送和登录流程
|
||||
"""
|
||||
import asyncio
|
||||
import sys
|
||||
from xhs_login import XHSLoginService
|
||||
|
||||
|
||||
async def test_send_verification_code(phone: str, proxy_index: int = 0):
|
||||
"""
|
||||
测试发送验证码流程
|
||||
|
||||
Args:
|
||||
phone: 手机号
|
||||
proxy_index: 代理索引 (0 或 1)
|
||||
"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"📱 测试发送验证码流程")
|
||||
print(f"{'='*60}")
|
||||
|
||||
# 从代理配置获取代理信息
|
||||
from damai_proxy_config import get_proxy_config
|
||||
proxy_config = get_proxy_config(proxy_index)
|
||||
proxy_server = proxy_config['server'].replace('http://', '')
|
||||
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
|
||||
|
||||
print(f"✅ 使用代理: 代理{proxy_index + 1}")
|
||||
print(f" 代理服务器: {proxy_config['server']}")
|
||||
print(f" 手机号: {phone}")
|
||||
|
||||
# 创建登录服务
|
||||
login_service = XHSLoginService()
|
||||
|
||||
try:
|
||||
# 初始化浏览器(使用代理)
|
||||
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
||||
await login_service.init_browser(proxy=proxy_url, user_agent=user_agent)
|
||||
print("✅ 浏览器初始化成功(已启用代理)")
|
||||
|
||||
# 发送验证码
|
||||
print(f"\n📤 正在发送验证码到 {phone}...")
|
||||
result = await login_service.send_verification_code(phone)
|
||||
|
||||
if result.get('success'):
|
||||
print(f"✅ 验证码发送成功!")
|
||||
print(f" 消息: {result.get('message')}")
|
||||
return login_service # 返回服务实例供后续登录使用
|
||||
else:
|
||||
print(f"❌ 验证码发送失败: {result.get('error')}")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 发送验证码过程异常: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return None
|
||||
|
||||
|
||||
async def test_login_with_code(login_service: XHSLoginService, phone: str, code: str):
|
||||
"""
|
||||
测试使用验证码登录
|
||||
|
||||
Args:
|
||||
login_service: XHSLoginService实例
|
||||
phone: 手机号
|
||||
code: 验证码
|
||||
"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"🔑 测试使用验证码登录")
|
||||
print(f"{'='*60}")
|
||||
|
||||
print(f" 手机号: {phone}")
|
||||
print(f" 验证码: {code}")
|
||||
|
||||
try:
|
||||
# 执行登录
|
||||
result = await login_service.login(phone, code)
|
||||
|
||||
if result.get('success'):
|
||||
print("✅ 登录成功!")
|
||||
|
||||
# 显示获取到的Cookies信息
|
||||
cookies = result.get('cookies', {})
|
||||
print(f" 获取到 {len(cookies)} 个Cookie")
|
||||
|
||||
# 保存完整Cookies到文件
|
||||
cookies_full = result.get('cookies_full', [])
|
||||
if cookies_full:
|
||||
import json
|
||||
with open('cookies.json', 'w', encoding='utf-8') as f:
|
||||
json.dump(cookies_full, f, ensure_ascii=False, indent=2)
|
||||
print(" ✅ 已保存完整Cookies到 cookies.json")
|
||||
|
||||
# 显示用户信息
|
||||
user_info = result.get('user_info', {})
|
||||
if user_info:
|
||||
print(f" 用户信息: {list(user_info.keys())}")
|
||||
|
||||
return result
|
||||
else:
|
||||
print(f"❌ 登录失败: {result.get('error')}")
|
||||
return result
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 登录过程异常: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return {"success": False, "error": str(e)}
|
||||
|
||||
|
||||
async def test_complete_login_flow(phone: str, code: str = None, proxy_index: int = 0):
|
||||
"""
|
||||
测试完整的登录流程
|
||||
|
||||
Args:
|
||||
phone: 手机号
|
||||
code: 验证码(如果为None,则只测试发送验证码)
|
||||
proxy_index: 代理索引
|
||||
"""
|
||||
print("="*60)
|
||||
print("🔄 测试完整登录流程")
|
||||
print("="*60)
|
||||
|
||||
# 步骤1: 发送验证码
|
||||
print("\n📋 步骤1: 发送验证码")
|
||||
login_service = await test_send_verification_code(phone, proxy_index)
|
||||
|
||||
if not login_service:
|
||||
print("❌ 发送验证码失败,终止流程")
|
||||
return
|
||||
|
||||
# 如果提供了验证码,则执行登录
|
||||
if code:
|
||||
print("\n📋 步骤2: 使用验证码登录")
|
||||
result = await test_login_with_code(login_service, phone, code)
|
||||
|
||||
if result.get('success'):
|
||||
print("\n🎉 完整登录流程成功!")
|
||||
else:
|
||||
print(f"\n❌ 完整登录流程失败: {result.get('error')}")
|
||||
else:
|
||||
print("\n⚠️ 提供了验证码参数才可完成登录步骤")
|
||||
print(" 请在手机上查看验证码,然后调用登录方法")
|
||||
|
||||
# 清理资源
|
||||
await login_service.close_browser()
|
||||
|
||||
|
||||
async def test_multiple_proxies_login(phone: str, proxy_indices: list = [0, 1]):
|
||||
"""
|
||||
测试使用多个代理进行登录
|
||||
|
||||
Args:
|
||||
phone: 手机号
|
||||
proxy_indices: 代理索引列表
|
||||
"""
|
||||
print("="*60)
|
||||
print("🔄 测试多代理登录")
|
||||
print("="*60)
|
||||
|
||||
for i, proxy_idx in enumerate(proxy_indices):
|
||||
print(f"\n🧪 测试代理 {proxy_idx + 1} (第 {i+1} 次尝试)")
|
||||
|
||||
# 由于验证码只能发送一次,这里只测试发送验证码
|
||||
login_service = await test_send_verification_code(phone, proxy_idx)
|
||||
|
||||
if login_service:
|
||||
print(f" ✅ 代理 {proxy_idx + 1} 发送验证码成功")
|
||||
await login_service.close_browser()
|
||||
else:
|
||||
print(f" ❌ 代理 {proxy_idx + 1} 发送验证码失败")
|
||||
|
||||
# 在测试之间添加延迟
|
||||
if i < len(proxy_indices) - 1:
|
||||
print(" ⏳ 等待3秒后测试下一个代理...")
|
||||
await asyncio.sleep(3)
|
||||
|
||||
|
||||
def show_usage_examples():
|
||||
"""显示使用示例"""
|
||||
print("="*60)
|
||||
print("💡 使用示例")
|
||||
print("="*60)
|
||||
|
||||
print("\n1️⃣ 仅发送验证码:")
|
||||
print(" # 发送验证码到手机号,使用代理1")
|
||||
print(" await test_send_verification_code('13800138000', proxy_index=0)")
|
||||
|
||||
print("\n2️⃣ 完整登录流程:")
|
||||
print(" # 完整流程:发送验证码 + 登录")
|
||||
print(" await test_complete_login_flow('13800138000', '123456', proxy_index=0)")
|
||||
|
||||
print("\n3️⃣ 多代理测试:")
|
||||
print(" # 测试多个代理")
|
||||
print(" await test_multiple_proxies_login('13800138000', [0, 1])")
|
||||
|
||||
|
||||
async def main():
|
||||
"""主函数"""
|
||||
show_usage_examples()
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print("🎯 选择测试模式")
|
||||
print("="*60)
|
||||
|
||||
print("\n1. 发送验证码测试")
|
||||
print("2. 完整登录流程测试")
|
||||
print("3. 多代理测试")
|
||||
|
||||
try:
|
||||
choice = input("\n请选择测试模式 (1-3, 默认为1): ").strip()
|
||||
|
||||
if choice not in ['1', '2', '3']:
|
||||
choice = '1'
|
||||
|
||||
phone = input("请输入手机号: ").strip()
|
||||
|
||||
if not phone:
|
||||
print("❌ 手机号不能为空")
|
||||
return
|
||||
|
||||
if choice == '1':
|
||||
proxy_choice = input("请选择代理 (0 或 1, 默认为0): ").strip()
|
||||
if proxy_choice not in ['0', '1']:
|
||||
proxy_choice = '0'
|
||||
proxy_idx = int(proxy_choice)
|
||||
|
||||
await test_send_verification_code(phone, proxy_idx)
|
||||
|
||||
elif choice == '2':
|
||||
code = input("请输入验证码 (留空则只测试发送): ").strip()
|
||||
proxy_choice = input("请选择代理 (0 或 1, 默认为0): ").strip()
|
||||
if proxy_choice not in ['0', '1']:
|
||||
proxy_choice = '0'
|
||||
proxy_idx = int(proxy_choice)
|
||||
|
||||
await test_complete_login_flow(phone, code if code else None, proxy_idx)
|
||||
|
||||
elif choice == '3':
|
||||
await test_multiple_proxies_login(phone)
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print("✅ 测试完成!")
|
||||
print("="*60)
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("\n\n⚠️ 测试被用户中断")
|
||||
except Exception as e:
|
||||
print(f"\n❌ 测试过程中出现错误: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Windows环境下设置事件循环策略
|
||||
if sys.platform == 'win32':
|
||||
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
|
||||
|
||||
# 运行测试
|
||||
asyncio.run(main())
|
||||
106
backend/test_login_page_config.py
Normal file
106
backend/test_login_page_config.py
Normal file
@@ -0,0 +1,106 @@
|
||||
"""
|
||||
测试登录页面配置功能
|
||||
验证通过配置文件控制登录页面类型(creator vs home)
|
||||
"""
|
||||
import sys
|
||||
from config import load_config
|
||||
|
||||
def test_config_reading():
|
||||
"""测试配置读取"""
|
||||
print("="*60)
|
||||
print("测试配置文件读取")
|
||||
print("="*60)
|
||||
|
||||
# 测试dev配置
|
||||
print("\n1. 测试开发环境配置 (config.dev.yaml)")
|
||||
config_dev = load_config('dev')
|
||||
login_page = config_dev.get_str('login.page', 'creator')
|
||||
login_headless = config_dev.get_bool('login.headless', False)
|
||||
|
||||
print(f" login.page = {login_page}")
|
||||
print(f" login.headless = {login_headless}")
|
||||
|
||||
# 根据配置决定预热URL
|
||||
if login_page == "home":
|
||||
preheat_url = "https://www.xiaohongshu.com"
|
||||
else:
|
||||
preheat_url = "https://creator.xiaohongshu.com/login"
|
||||
|
||||
print(f" 预热URL = {preheat_url}")
|
||||
|
||||
# 测试prod配置
|
||||
print("\n2. 测试生产环境配置 (config.prod.yaml)")
|
||||
config_prod = load_config('prod')
|
||||
login_page_prod = config_prod.get_str('login.page', 'creator')
|
||||
login_headless_prod = config_prod.get_bool('login.headless', False)
|
||||
|
||||
print(f" login.page = {login_page_prod}")
|
||||
print(f" login.headless = {login_headless_prod}")
|
||||
|
||||
if login_page_prod == "home":
|
||||
preheat_url_prod = "https://www.xiaohongshu.com"
|
||||
else:
|
||||
preheat_url_prod = "https://creator.xiaohongshu.com/login"
|
||||
|
||||
print(f" 预热URL = {preheat_url_prod}")
|
||||
|
||||
print("\n" + "="*60)
|
||||
print("✅ 配置读取测试完成")
|
||||
print("="*60)
|
||||
|
||||
|
||||
def test_api_parameter_override():
|
||||
"""测试API参数覆盖配置"""
|
||||
print("\n" + "="*60)
|
||||
print("测试API参数覆盖配置")
|
||||
print("="*60)
|
||||
|
||||
config = load_config('dev')
|
||||
default_login_page = config.get_str('login.page', 'creator')
|
||||
|
||||
# 模拟不同的API参数情况
|
||||
test_cases = [
|
||||
(None, "应使用配置默认值"),
|
||||
("creator", "API指定creator"),
|
||||
("home", "API指定home"),
|
||||
]
|
||||
|
||||
for api_param, description in test_cases:
|
||||
login_page = api_param if api_param else default_login_page
|
||||
print(f"\n场景: {description}")
|
||||
print(f" 配置默认值 = {default_login_page}")
|
||||
print(f" API参数 = {api_param}")
|
||||
print(f" 最终使用 = {login_page}")
|
||||
|
||||
# 决定URL
|
||||
if login_page == "home":
|
||||
url = "https://www.xiaohongshu.com"
|
||||
page_name = "小红书首页"
|
||||
else:
|
||||
url = "https://creator.xiaohongshu.com/login"
|
||||
page_name = "创作者中心"
|
||||
|
||||
print(f" → 将访问: {page_name} ({url})")
|
||||
|
||||
print("\n" + "="*60)
|
||||
print("✅ API参数覆盖测试完成")
|
||||
print("="*60)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
try:
|
||||
test_config_reading()
|
||||
test_api_parameter_override()
|
||||
|
||||
print("\n🎉 所有测试通过!")
|
||||
print("\n使用说明:")
|
||||
print("1. 在 config.dev.yaml 或 config.prod.yaml 中修改 login.page 配置")
|
||||
print("2. 可选值: creator (创作者中心) 或 home (小红书首页)")
|
||||
print("3. API请求中的 login_page 参数可以覆盖配置文件的默认值")
|
||||
print("4. 如果API请求不传 login_page 参数,将使用配置文件中的默认值")
|
||||
|
||||
except Exception as e:
|
||||
print(f"\n❌ 测试失败: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
sys.exit(1)
|
||||
246
backend/test_optimized_browser.py
Normal file
246
backend/test_optimized_browser.py
Normal file
@@ -0,0 +1,246 @@
|
||||
"""
|
||||
优化的代理浏览器配置
|
||||
解决小红书对代理IP的限制问题
|
||||
"""
|
||||
import asyncio
|
||||
from playwright.async_api import async_playwright
|
||||
import sys
|
||||
|
||||
|
||||
async def test_optimized_proxy_browser(proxy_index: int = 0):
|
||||
"""测试优化的代理浏览器配置"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"🚀 测试优化的代理浏览器配置")
|
||||
print(f"{'='*60}")
|
||||
|
||||
# 从代理配置获取代理信息
|
||||
from damai_proxy_config import get_proxy_config
|
||||
proxy_config = get_proxy_config(proxy_index)
|
||||
proxy_server = proxy_config['server'].replace('http://', '')
|
||||
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
|
||||
|
||||
print(f"✅ 使用代理: 代理{proxy_index + 1}")
|
||||
print(f" 代理服务器: {proxy_config['server']}")
|
||||
|
||||
try:
|
||||
async with async_playwright() as p:
|
||||
# 配置代理
|
||||
proxy_parts = proxy_url.replace('http://', '').replace('https://', '').split('@')
|
||||
if len(proxy_parts) == 2:
|
||||
auth_part = proxy_parts[0]
|
||||
server_part = proxy_parts[1]
|
||||
username, password = auth_part.split(':')
|
||||
|
||||
proxy_config_obj = {
|
||||
"server": f"http://{server_part}",
|
||||
"username": username,
|
||||
"password": password
|
||||
}
|
||||
else:
|
||||
proxy_config_obj = {"server": proxy_url}
|
||||
|
||||
print(f" 配置的代理对象: {proxy_config_obj}")
|
||||
|
||||
# 启动浏览器 - 使用优化参数
|
||||
browser = await p.chromium.launch(
|
||||
headless=False, # 使用有头模式,便于观察
|
||||
proxy=proxy_config_obj,
|
||||
args=[
|
||||
'--no-sandbox',
|
||||
'--disable-setuid-sandbox',
|
||||
'--disable-dev-shm-usage',
|
||||
'--disable-blink-features=AutomationControlled',
|
||||
'--disable-background-timer-throttling',
|
||||
'--disable-renderer-backgrounding',
|
||||
'--disable-background-networking',
|
||||
'--enable-features=NetworkService,NetworkServiceInProcess',
|
||||
'--disable-ipc-flooding-protection',
|
||||
'--disable-web-security',
|
||||
'--disable-features=IsolateOrigins,site-per-process',
|
||||
'--disable-site-isolation-trials',
|
||||
'--disable-extensions',
|
||||
'--disable-breakpad',
|
||||
'--disable-component-extensions-with-background-pages',
|
||||
'--disable-hang-monitor',
|
||||
'--disable-prompt-on-repost',
|
||||
'--disable-domain-reliability',
|
||||
'--disable-component-update',
|
||||
'--hide-scrollbars',
|
||||
'--mute-audio',
|
||||
'--no-first-run',
|
||||
'--no-default-browser-check',
|
||||
'--metrics-recording-only',
|
||||
'--force-color-profile=srgb',
|
||||
'--disable-default-apps',
|
||||
'--disable-features=TranslateUI',
|
||||
'--disable-features=Translate',
|
||||
'--disable-features=OptimizationHints',
|
||||
'--disable-features=InterestCohortAPI',
|
||||
'--disable-features=BlinkGenPropertyTrees',
|
||||
'--disable-features=ImprovedCookieControls',
|
||||
'--disable-features=SameSiteDefaultChecksMethodRigorously',
|
||||
'--disable-features=CookieSameSiteByDefaultWhenReportingEnabled',
|
||||
'--disable-features=AutofillServerCommunication',
|
||||
'--disable-features=AutofillUseOptimizedLocalStorage',
|
||||
'--disable-features=CalculateNativeWinOcclusion',
|
||||
'--disable-features=VizDisplayCompositor',
|
||||
'--disable-features=VizHitTestQuery',
|
||||
]
|
||||
)
|
||||
|
||||
# 创建上下文 - 设置浏览器指纹混淆
|
||||
context = await browser.new_context(
|
||||
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
|
||||
viewport={'width': 1280, 'height': 720},
|
||||
# 隐瞒自动化特征
|
||||
bypass_csp=True,
|
||||
java_script_enabled=True,
|
||||
)
|
||||
|
||||
# 创建页面
|
||||
page = await context.new_page()
|
||||
|
||||
# 隐瞒自动化特征
|
||||
await page.add_init_script("""
|
||||
Object.defineProperty(navigator, 'webdriver', {
|
||||
get: () => undefined,
|
||||
});
|
||||
|
||||
Object.defineProperty(navigator, 'plugins', {
|
||||
get: () => [1, 2, 3, 4, 5],
|
||||
});
|
||||
|
||||
Object.defineProperty(navigator, 'languages', {
|
||||
get: () => ['zh-CN', 'zh', 'en'],
|
||||
});
|
||||
|
||||
// 隐瞒代理检测
|
||||
delete window.cdc_adoQpoasnfa76pfcZLmcfl_Array;
|
||||
delete window.cdc_adoQpoasnfa76pfcZLmcfl_Promise;
|
||||
delete window.cdc_adoQpoasnfa76pfcZLmcfl_Symbol;
|
||||
""")
|
||||
|
||||
print(f"\n🌐 访问百度测试代理连接...")
|
||||
try:
|
||||
await page.goto('https://www.baidu.com', wait_until='domcontentloaded', timeout=15000)
|
||||
await asyncio.sleep(2)
|
||||
|
||||
title = await page.title()
|
||||
url = page.url
|
||||
print(f" ✅ 百度访问成功")
|
||||
print(f" 标题: {title}")
|
||||
print(f" URL: {url}")
|
||||
except Exception as e:
|
||||
print(f" ❌ 百度访问失败: {str(e)}")
|
||||
|
||||
print(f"\n🌐 访问小红书创作者平台...")
|
||||
try:
|
||||
await page.goto('https://creator.xiaohongshu.com/login', wait_until='domcontentloaded', timeout=30000)
|
||||
await asyncio.sleep(3) # 等待更长时间
|
||||
|
||||
title = await page.title()
|
||||
url = page.url
|
||||
content = await page.content()
|
||||
content_len = len(content)
|
||||
|
||||
print(f" 访问结果:")
|
||||
print(f" 标题: {title}")
|
||||
print(f" URL: {url}")
|
||||
print(f" 内容长度: {content_len} 字符")
|
||||
|
||||
if content_len == 0:
|
||||
print(f" ⚠️ 页面内容为空")
|
||||
elif "验证" in content or "captcha" in content.lower() or "安全" in content:
|
||||
print(f" ⚠️ 检测到验证或安全提示")
|
||||
else:
|
||||
print(f" ✅ 页面加载成功")
|
||||
|
||||
# 查找手机号输入框
|
||||
print(f"\n🔍 查找手机号输入框...")
|
||||
try:
|
||||
phone_input = await page.wait_for_selector('input[placeholder="手机号"]', timeout=5000)
|
||||
if phone_input:
|
||||
print(f" ✅ 找到手机号输入框")
|
||||
else:
|
||||
print(f" ❌ 未找到手机号输入框")
|
||||
except:
|
||||
print(f" ❌ 未找到手机号输入框")
|
||||
|
||||
# 查找所有input元素
|
||||
inputs = await page.query_selector_all('input')
|
||||
print(f" 找到 {len(inputs)} 个input元素")
|
||||
|
||||
# 查找发送验证码按钮
|
||||
print(f"\n🔍 查找发送验证码按钮...")
|
||||
try:
|
||||
code_button = await page.wait_for_selector('text="发送验证码"', timeout=5000)
|
||||
if code_button:
|
||||
print(f" ✅ 找到发送验证码按钮")
|
||||
else:
|
||||
print(f" ❌ 未找到发送验证码按钮")
|
||||
except:
|
||||
print(f" ❌ 未找到发送验证码按钮")
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ 小红书访问失败: {str(e)}")
|
||||
|
||||
print(f"\n⏸️ 浏览器保持打开状态,您可以观察页面")
|
||||
print(f" 按 Enter 键关闭浏览器...")
|
||||
input()
|
||||
|
||||
await browser.close()
|
||||
print(f"✅ 浏览器已关闭")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 测试过程异常: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
|
||||
|
||||
def explain_optimizations():
|
||||
"""解释优化措施"""
|
||||
print("="*60)
|
||||
print("🔧 优化措施说明")
|
||||
print("="*60)
|
||||
|
||||
print("\n1. 浏览器启动参数优化:")
|
||||
print(" • 添加更多反检测参数")
|
||||
print(" • 禁用可能导致检测的功能")
|
||||
|
||||
print("\n2. 浏览器指纹混淆:")
|
||||
print(" • 隐瞒webdriver特征")
|
||||
print(" • 伪造插件列表")
|
||||
print(" • 设置真实语言")
|
||||
|
||||
print("\n3. 页面加载策略:")
|
||||
print(" • 使用domcontentloaded而非networkidle")
|
||||
print(" • 增加超时时间")
|
||||
|
||||
|
||||
async def main():
|
||||
"""主函数"""
|
||||
explain_optimizations()
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print("🎯 选择代理进行测试")
|
||||
print("="*60)
|
||||
|
||||
proxy_choice = input("\n请选择代理 (0 或 1, 默认为0): ").strip()
|
||||
if proxy_choice not in ['0', '1']:
|
||||
proxy_choice = '0'
|
||||
proxy_idx = int(proxy_choice)
|
||||
|
||||
await test_optimized_proxy_browser(proxy_idx)
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print("✅ 测试完成!")
|
||||
print("="*60)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Windows环境下设置事件循环策略
|
||||
if sys.platform == 'win32':
|
||||
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
|
||||
|
||||
# 运行测试
|
||||
asyncio.run(main())
|
||||
51
backend/test_oss.py
Normal file
51
backend/test_oss.py
Normal file
@@ -0,0 +1,51 @@
|
||||
"""
|
||||
测试OSS上传功能
|
||||
"""
|
||||
import sys
|
||||
from oss_utils import OSSUploader
|
||||
|
||||
def test_oss_connection():
|
||||
"""测试OSS连接"""
|
||||
print("=" * 60)
|
||||
print("测试阿里云OSS连接")
|
||||
print("=" * 60)
|
||||
|
||||
try:
|
||||
# 创建OSS上传器
|
||||
uploader = OSSUploader()
|
||||
|
||||
print(f"\n✅ OSS配置:")
|
||||
print(f" Bucket: {uploader.bucket_name}")
|
||||
print(f" Endpoint: {uploader.endpoint}")
|
||||
print(f" Access Key ID: {uploader.access_key_id[:8]}...")
|
||||
|
||||
# 测试Bucket是否可访问
|
||||
try:
|
||||
# 列出bucket中的对象(最多1个)
|
||||
result = uploader.bucket.list_objects(prefix=uploader.base_path, max_keys=1)
|
||||
print(f"\n✅ Bucket访问成功!")
|
||||
print(f" 基础路径: {uploader.base_path}")
|
||||
|
||||
if result.object_list:
|
||||
print(f" 示例文件: {result.object_list[0].key}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"\n❌ Bucket访问失败: {e}")
|
||||
return False
|
||||
|
||||
print("\n" + "=" * 60)
|
||||
print("✅ OSS配置测试通过!")
|
||||
print("=" * 60)
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
print(f"\n❌ OSS初始化失败: {e}")
|
||||
print("\n请检查配置:")
|
||||
print(" 1. Access Key ID和Secret是否正确")
|
||||
print(" 2. Bucket名称是否正确")
|
||||
print(" 3. Endpoint地区是否匹配")
|
||||
return False
|
||||
|
||||
if __name__ == "__main__":
|
||||
success = test_oss_connection()
|
||||
sys.exit(0 if success else 1)
|
||||
16
backend/test_password_hash.py
Normal file
16
backend/test_password_hash.py
Normal file
@@ -0,0 +1,16 @@
|
||||
#!/usr/bin/env python
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
import hashlib
|
||||
|
||||
passwords = [
|
||||
"123456",
|
||||
"password",
|
||||
"admin123",
|
||||
]
|
||||
|
||||
print("=== Python SHA256 密码加密测试 ===")
|
||||
for pwd in passwords:
|
||||
hash_result = hashlib.sha256(pwd.encode('utf-8')).hexdigest()
|
||||
print(f"密码: {pwd}")
|
||||
print(f"SHA256: {hash_result}\n")
|
||||
152
backend/test_proxy_connectivity.py
Normal file
152
backend/test_proxy_connectivity.py
Normal file
@@ -0,0 +1,152 @@
|
||||
#!/usr/bin/env python
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
固定代理IP测试脚本
|
||||
使用requests请求代理服务器,验证代理是否可用
|
||||
"""
|
||||
|
||||
import requests
|
||||
import json
|
||||
from damai_proxy_config import get_proxy_config, get_all_enabled_proxies
|
||||
|
||||
|
||||
def test_proxy_requests(proxy_info, target_url="http://httpbin.org/ip"):
|
||||
"""
|
||||
使用requests测试代理IP
|
||||
|
||||
Args:
|
||||
proxy_info: 代理信息字典,包含server, username, password
|
||||
target_url: 目标测试URL
|
||||
"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"🔍 测试代理: {proxy_info.get('name', 'Unknown')}")
|
||||
print(f" 服务器: {proxy_info['server']}")
|
||||
print(f" 用户名: {proxy_info['username']}")
|
||||
print(f" 目标URL: {target_url}")
|
||||
print(f"{'='*60}")
|
||||
|
||||
# 构建代理认证信息
|
||||
proxy_server = proxy_info['server'].replace('http://', '')
|
||||
proxy_url = f"http://{proxy_info['username']}:{proxy_info['password']}@{proxy_server}"
|
||||
|
||||
proxies = {
|
||||
"http": proxy_url,
|
||||
"https": proxy_url
|
||||
}
|
||||
|
||||
try:
|
||||
# 发送测试请求
|
||||
print("🚀 发送测试请求...")
|
||||
response = requests.get(target_url, proxies=proxies, timeout=5) # 减少超时时间到5秒
|
||||
|
||||
if response.status_code == 200:
|
||||
print(f"✅ 代理测试成功!状态码: {response.status_code}")
|
||||
|
||||
# 尝试解析IP信息
|
||||
try:
|
||||
ip_info = response.json()
|
||||
print(f"🌐 当前IP信息: {json.dumps(ip_info, indent=2, ensure_ascii=False)}")
|
||||
except:
|
||||
print(f"🌐 页面内容 (前500字符): {response.text[:500]}")
|
||||
|
||||
return True
|
||||
else:
|
||||
print(f"❌ 代理测试失败!状态码: {response.status_code}")
|
||||
print(f"响应内容: {response.text[:200]}")
|
||||
return False
|
||||
|
||||
except requests.exceptions.ProxyError:
|
||||
print("❌ 代理连接错误:无法连接到代理服务器")
|
||||
return False
|
||||
except requests.exceptions.ConnectTimeout:
|
||||
print("❌ 连接超时:代理服务器响应超时")
|
||||
return False
|
||||
except requests.exceptions.RequestException as e:
|
||||
print(f"❌ 请求异常: {str(e)}")
|
||||
return False
|
||||
|
||||
|
||||
def test_all_proxies():
|
||||
"""测试所有配置的代理"""
|
||||
print("🎯 开始测试所有代理IP")
|
||||
|
||||
proxies = get_all_enabled_proxies()
|
||||
|
||||
if not proxies:
|
||||
print("❌ 没有找到可用的代理配置")
|
||||
return
|
||||
|
||||
print(f"📊 共找到 {len(proxies)} 个代理IP")
|
||||
|
||||
results = []
|
||||
for i, proxy in enumerate(proxies, 1):
|
||||
print(f"\n\n{'#'*60}")
|
||||
print(f"# 测试进度: {i}/{len(proxies)}")
|
||||
print(f"{'#'*60}")
|
||||
|
||||
success = test_proxy_requests(proxy)
|
||||
results.append({
|
||||
'proxy': proxy['name'],
|
||||
'server': proxy['server'],
|
||||
'success': success
|
||||
})
|
||||
|
||||
if i < len(proxies):
|
||||
print(f"\n⏳ 等待2秒后测试下一个代理...")
|
||||
import time
|
||||
time.sleep(2)
|
||||
|
||||
# 输出测试结果汇总
|
||||
print(f"\n{'='*60}")
|
||||
print("📊 测试结果汇总:")
|
||||
print(f"{'='*60}")
|
||||
|
||||
success_count = 0
|
||||
for result in results:
|
||||
status = "✅ 成功" if result['success'] else "❌ 失败"
|
||||
print(f" {result['proxy']} ({result['server']}) - {status}")
|
||||
if result['success']:
|
||||
success_count += 1
|
||||
|
||||
print(f"\n📈 总体成功率: {success_count}/{len(results)} ({success_count/len(results)*100:.1f}%)")
|
||||
|
||||
# 如果有成功的代理,显示可用于小红书的代理
|
||||
successful_proxies = [r for r in results if r['success']]
|
||||
if successful_proxies:
|
||||
print(f"\n🎉 以下代理可用于小红书登录发文:")
|
||||
for proxy in successful_proxies:
|
||||
print(f" - {proxy['proxy']}: {proxy['server']}")
|
||||
|
||||
return results
|
||||
|
||||
|
||||
def test_xhs_proxy_format():
|
||||
"""测试适用于小红书的代理格式"""
|
||||
print(f"\n{'='*60}")
|
||||
print("🔧 测试适用于Playwright的代理格式")
|
||||
print(f"{'='*60}")
|
||||
|
||||
proxies = get_all_enabled_proxies()
|
||||
|
||||
for proxy in proxies:
|
||||
server = proxy['server'].replace('http://', '') # 移除http://前缀
|
||||
proxy_url = f"http://{proxy['username']}:{proxy['password']}@{server}"
|
||||
print(f" {proxy['name']}:")
|
||||
print(f" 服务器地址: {proxy['server']}")
|
||||
print(f" Playwright格式: {proxy_url}")
|
||||
print()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("🚀 开始测试固定代理IP")
|
||||
|
||||
# 测试代理格式
|
||||
test_xhs_proxy_format()
|
||||
|
||||
# 测试所有代理
|
||||
test_all_proxies()
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print("🎉 代理测试完成!")
|
||||
print(f"{'='*60}")
|
||||
126
backend/test_proxy_detailed.py
Normal file
126
backend/test_proxy_detailed.py
Normal file
@@ -0,0 +1,126 @@
|
||||
"""
|
||||
固定代理IP详细测试脚本
|
||||
测试代理IP在Playwright中的表现,包含更多调试信息
|
||||
"""
|
||||
import asyncio
|
||||
import json
|
||||
import sys
|
||||
from xhs_login import XHSLoginService
|
||||
from damai_proxy_config import get_proxy_config
|
||||
|
||||
|
||||
async def test_proxy_detailed(proxy_index: int = 0):
|
||||
"""详细测试代理IP"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"🔍 详细测试代理: 代理{proxy_index + 1}")
|
||||
print(f"{'='*60}")
|
||||
|
||||
# 获取代理配置
|
||||
try:
|
||||
proxy_config = get_proxy_config(proxy_index)
|
||||
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_config['server'][7:]}" # 移除http://前缀再重新组装
|
||||
print(f"✅ 获取代理配置成功: 代理{proxy_index + 1}")
|
||||
print(f" 代理服务器: {proxy_config['server']}")
|
||||
except Exception as e:
|
||||
print(f"❌ 获取代理配置失败: {str(e)}")
|
||||
return None
|
||||
|
||||
# 创建登录服务实例
|
||||
login_service = XHSLoginService(use_pool=False) # 不使用池,便于调试
|
||||
|
||||
try:
|
||||
# 初始化浏览器(使用代理)
|
||||
print(f"\n🚀 正在启动浏览器(使用代理)...")
|
||||
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
||||
await login_service.init_browser(proxy=proxy_url, user_agent=user_agent)
|
||||
print("✅ 浏览器启动成功")
|
||||
|
||||
# 测试访问普通网站
|
||||
print(f"\n📍 测试访问普通网站(百度)...")
|
||||
try:
|
||||
await login_service.page.goto('https://www.baidu.com', wait_until='networkidle', timeout=10000)
|
||||
await asyncio.sleep(2)
|
||||
title = await login_service.page.title()
|
||||
url = login_service.page.url
|
||||
print(f"✅ 百度访问成功")
|
||||
print(f" 页面标题: {title}")
|
||||
print(f" 当前URL: {url}")
|
||||
except Exception as e:
|
||||
print(f"❌ 百度访问失败: {str(e)}")
|
||||
|
||||
# 测试访问IP检测网站
|
||||
print(f"\n📍 测试访问IP检测网站...")
|
||||
try:
|
||||
await login_service.page.goto('http://httpbin.org/ip', wait_until='networkidle', timeout=10000)
|
||||
await asyncio.sleep(2)
|
||||
content = await login_service.page.content()
|
||||
print(f"✅ IP检测网站访问成功")
|
||||
print(f" 页面内容: {content[:200]}...")
|
||||
except Exception as e:
|
||||
print(f"❌ IP检测网站访问失败: {str(e)}")
|
||||
|
||||
# 测试访问小红书创作者平台
|
||||
print(f"\n📍 测试访问小红书创作者平台...")
|
||||
try:
|
||||
await login_service.page.goto('https://creator.xiaohongshu.com/login', wait_until='networkidle', timeout=20000) # 增加超时时间
|
||||
await asyncio.sleep(3) # 等待更长时间
|
||||
|
||||
title = await login_service.page.title()
|
||||
url = login_service.page.url
|
||||
print(f"✅ 小红书访问成功")
|
||||
print(f" 页面标题: '{title}'")
|
||||
print(f" 当前URL: {url}")
|
||||
|
||||
# 检查页面内容
|
||||
content = await login_service.page.content()
|
||||
if "验证" in content or "captcha" in content.lower() or "block" in content.lower() or "安全验证" in content:
|
||||
print("⚠️ 检测到可能的验证或拦截")
|
||||
else:
|
||||
print("✅ 未检测到验证拦截")
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 小红书访问失败: {str(e)}")
|
||||
# 尝试访问普通页面看看是否完全被封
|
||||
try:
|
||||
await login_service.page.goto('https://www.google.com', wait_until='networkidle', timeout=10000)
|
||||
print(" 提示: 代理可以访问其他网站,但可能被小红书限制")
|
||||
except Exception:
|
||||
print(" 提示: 代理可能完全被限制")
|
||||
|
||||
print(f"\n✅ 代理{proxy_index + 1} 详细测试完成")
|
||||
return login_service
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 代理{proxy_index + 1} 详细测试失败: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return None
|
||||
finally:
|
||||
# 关闭浏览器
|
||||
await login_service.close_browser()
|
||||
|
||||
|
||||
async def main():
|
||||
"""主测试函数"""
|
||||
print("\n" + "="*60)
|
||||
print("🎯 固定代理IP详细测试")
|
||||
print("="*60)
|
||||
|
||||
# 测试两个代理
|
||||
for i in range(2):
|
||||
await test_proxy_detailed(i)
|
||||
print(f"\n⏳ 等待3秒后测试下一个代理...")
|
||||
await asyncio.sleep(3)
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print("🎉 详细测试完成!")
|
||||
print("="*60)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Windows环境下设置事件循环策略
|
||||
if sys.platform == 'win32':
|
||||
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
|
||||
|
||||
# 运行测试
|
||||
asyncio.run(main())
|
||||
219
backend/test_proxy_xhs.py
Normal file
219
backend/test_proxy_xhs.py
Normal file
@@ -0,0 +1,219 @@
|
||||
"""
|
||||
固定代理IP下小红书登录发文功能测试脚本
|
||||
测试使用固定代理IP进行小红书登录和发文功能
|
||||
"""
|
||||
import asyncio
|
||||
import json
|
||||
import sys
|
||||
from xhs_login import XHSLoginService
|
||||
from xhs_publish import XHSPublishService
|
||||
from damai_proxy_config import get_proxy_config
|
||||
|
||||
|
||||
async def test_login_with_proxy(proxy_index: int = 0):
|
||||
"""使用指定代理测试小红书登录"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"🔍 开始测试代理登录: 代理{proxy_index + 1}")
|
||||
print(f"{'='*60}")
|
||||
|
||||
# 获取代理配置
|
||||
try:
|
||||
proxy_config = get_proxy_config(proxy_index)
|
||||
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_config['server'][7:]}" # 移除http://前缀再重新组装
|
||||
print(f"✅ 获取代理配置成功: 代理{proxy_index + 1}")
|
||||
print(f" 代理服务器: {proxy_config['server']}")
|
||||
except Exception as e:
|
||||
print(f"❌ 获取代理配置失败: {str(e)}")
|
||||
return None
|
||||
|
||||
# 创建登录服务实例
|
||||
login_service = XHSLoginService()
|
||||
|
||||
try:
|
||||
# 初始化浏览器(使用代理)
|
||||
print(f"\n🚀 正在启动浏览器(使用代理)...")
|
||||
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
||||
await login_service.init_browser(proxy=proxy_url, user_agent=user_agent)
|
||||
print("✅ 浏览器启动成功")
|
||||
|
||||
# 访问小红书创作者平台
|
||||
print(f"\n📍 访问小红书创作者平台...")
|
||||
await login_service.page.goto('https://creator.xiaohongshu.com/login', wait_until='networkidle', timeout=30000)
|
||||
await asyncio.sleep(2)
|
||||
|
||||
title = await login_service.page.title()
|
||||
url = login_service.page.url
|
||||
print(f"✅ 访问成功")
|
||||
print(f" 页面标题: {title}")
|
||||
print(f" 当前URL: {url}")
|
||||
|
||||
# 检查是否被代理拦截或出现验证码
|
||||
content = await login_service.page.content()
|
||||
if "验证" in content or "captcha" in content.lower() or "block" in content.lower():
|
||||
print("⚠️ 检测到可能的验证或拦截")
|
||||
|
||||
print(f"\n✅ 代理{proxy_index + 1} 连接测试完成")
|
||||
return login_service
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 代理{proxy_index + 1} 测试失败: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return None
|
||||
finally:
|
||||
# 注意:这里不关闭浏览器,让调用者决定何时关闭
|
||||
pass
|
||||
|
||||
|
||||
async def test_publish_with_proxy(cookies, proxy_index: int = 0):
|
||||
"""使用指定代理测试小红书发文"""
|
||||
print(f"\n{'='*60}")
|
||||
print(f"📝 开始测试代理发文: 代理{proxy_index + 1}")
|
||||
print(f"{'='*60}")
|
||||
|
||||
# 获取代理配置
|
||||
try:
|
||||
proxy_config = get_proxy_config(proxy_index)
|
||||
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_config['server'][7:]}" # 移除http://前缀再重新组装
|
||||
print(f"✅ 获取代理配置成功: 代理{proxy_index + 1}")
|
||||
print(f" 代理服务器: {proxy_config['server']}")
|
||||
except Exception as e:
|
||||
print(f"❌ 获取代理配置失败: {str(e)}")
|
||||
return None
|
||||
|
||||
# 准备测试数据
|
||||
title = "【代理测试】固定IP代理发布测试"
|
||||
content = """这是一条通过固定IP代理发布的测试笔记 📝
|
||||
|
||||
测试内容:
|
||||
- 验证代理IP是否正常工作
|
||||
- 检查发布功能是否正常
|
||||
- 确认网络连接稳定性
|
||||
|
||||
如果你看到这条笔记,说明代理发布成功了!
|
||||
|
||||
#代理测试 #自动化发布 #网络测试"""
|
||||
|
||||
# 测试图片(可选)
|
||||
images = [] # 可以添加图片路径进行测试
|
||||
|
||||
# 标签
|
||||
tags = ["代理测试", "自动化发布", "网络测试"]
|
||||
|
||||
try:
|
||||
# 创建发布服务
|
||||
print(f"\n🚀 创建发布服务(使用代理: 代理{proxy_index + 1})...")
|
||||
publisher = XHSPublishService(cookies, proxy=proxy_url)
|
||||
|
||||
# 执行发布
|
||||
print(f"\n📤 开始发布笔记...")
|
||||
result = await publisher.publish(
|
||||
title=title,
|
||||
content=content,
|
||||
images=images if images else None,
|
||||
tags=tags
|
||||
)
|
||||
|
||||
# 显示结果
|
||||
print(f"\n{'='*50}")
|
||||
print("发布结果:")
|
||||
print(json.dumps(result, ensure_ascii=False, indent=2))
|
||||
print("="*50)
|
||||
|
||||
if result.get('success'):
|
||||
print(f"\n✅ 代理{proxy_index + 1} 发布测试成功!")
|
||||
if 'url' in result:
|
||||
print(f"📎 笔记链接: {result['url']}")
|
||||
else:
|
||||
print(f"\n❌ 代理{proxy_index + 1} 发布测试失败: {result.get('error')}")
|
||||
|
||||
return result
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 代理{proxy_index + 1} 发布测试异常: {str(e)}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
return None
|
||||
|
||||
|
||||
async def main():
|
||||
"""主测试函数"""
|
||||
print("\n" + "="*60)
|
||||
print("🎯 固定代理IP下小红书登录发文功能测试")
|
||||
print("="*60)
|
||||
|
||||
# 测试代理连接
|
||||
login_service = None
|
||||
for i in range(2): # 测试两个代理
|
||||
login_service = await test_login_with_proxy(i)
|
||||
if login_service:
|
||||
print(f"✅ 代理{i+1} 连接测试成功,可以用于后续操作")
|
||||
break
|
||||
else:
|
||||
print(f"⚠️ 代理{i+1} 连接测试失败,尝试下一个代理...")
|
||||
|
||||
if not login_service:
|
||||
print("\n❌ 所有代理都无法连接,测试终止")
|
||||
return
|
||||
|
||||
try:
|
||||
# 验证登录状态(虽然我们没有真正的登录,但可以检查Cookie是否有效)
|
||||
print(f"\n🔍 验证当前浏览器状态...")
|
||||
verify_result = await login_service.verify_login_status()
|
||||
print(f"验证结果: {verify_result.get('message', '未知状态')}")
|
||||
except Exception as e:
|
||||
print(f"验证状态时出错: {str(e)}")
|
||||
|
||||
# 如果有cookies.json文件,可以尝试使用已保存的cookies进行发布测试
|
||||
cookies = None
|
||||
try:
|
||||
with open('cookies.json', 'r', encoding='utf-8') as f:
|
||||
cookies = json.load(f)
|
||||
print(f"\n✅ 成功读取 cookies.json,包含 {len(cookies)} 个Cookie")
|
||||
except FileNotFoundError:
|
||||
print(f"\n⚠️ cookies.json 文件不存在,跳过发布测试")
|
||||
print(" 如需测试发布功能,请先登录获取Cookie")
|
||||
|
||||
if cookies:
|
||||
# 使用第一个有效的代理进行发布测试
|
||||
for i in range(2):
|
||||
proxy_config = get_proxy_config(i)
|
||||
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_config['server'][7:]}"
|
||||
|
||||
# 测试代理连接
|
||||
temp_login = XHSLoginService()
|
||||
try:
|
||||
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
|
||||
await temp_login.init_browser(cookies=cookies, proxy=proxy_url, user_agent=user_agent)
|
||||
|
||||
# 验证登录状态
|
||||
verify_result = await temp_login.verify_login_status()
|
||||
if verify_result.get('logged_in'):
|
||||
print(f"\n✅ 代理{i+1} + Cookie 组合验证成功,开始发布测试")
|
||||
await test_publish_with_proxy(cookies, i)
|
||||
break
|
||||
else:
|
||||
print(f"⚠️ 代理{i+1} + Cookie 组合验证失败")
|
||||
except Exception as e:
|
||||
print(f"⚠️ 代理{i+1} 连接测试失败: {str(e)}")
|
||||
finally:
|
||||
await temp_login.close_browser()
|
||||
else:
|
||||
print("\n❌ 所有代理都无法与Cookie配合使用,发布测试终止")
|
||||
|
||||
# 清理资源
|
||||
if login_service:
|
||||
await login_service.close_browser()
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print("🎉 测试完成!")
|
||||
print("="*60)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Windows环境下设置事件循环策略
|
||||
if sys.platform == 'win32':
|
||||
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
|
||||
|
||||
# 运行测试
|
||||
asyncio.run(main())
|
||||
224
backend/verify_proxy_correct.py
Normal file
224
backend/verify_proxy_correct.py
Normal file
@@ -0,0 +1,224 @@
|
||||
"""
|
||||
准确的Playwright代理IP验证脚本
|
||||
验证Playwright是否正确使用了带认证信息的代理IP
|
||||
"""
|
||||
import asyncio
|
||||
from playwright.async_api import async_playwright
|
||||
import requests
|
||||
|
||||
|
||||
async def get_my_ip_requests():
|
||||
"""使用requests获取当前IP(不使用代理)"""
|
||||
try:
|
||||
response = requests.get('http://httpbin.org/ip', timeout=10)
|
||||
if response.status_code == 200:
|
||||
data = response.json()
|
||||
return data.get('origin', 'Unknown')
|
||||
except Exception as e:
|
||||
print(f"获取本机IP失败: {str(e)}")
|
||||
return None
|
||||
|
||||
|
||||
async def get_ip_with_playwright_proxy_correct(proxy_url):
|
||||
"""使用Playwright获取IP(正确使用代理认证)"""
|
||||
try:
|
||||
async with async_playwright() as p:
|
||||
# 正确的代理配置格式,包含认证信息
|
||||
proxy_parts = proxy_url.replace('http://', '').replace('https://', '').split('@')
|
||||
if len(proxy_parts) == 2:
|
||||
# 格式: username:password@host:port
|
||||
auth_part = proxy_parts[0]
|
||||
server_part = proxy_parts[1]
|
||||
|
||||
username, password = auth_part.split(':')
|
||||
|
||||
proxy_config = {
|
||||
"server": f"http://{server_part}",
|
||||
"username": username,
|
||||
"password": password
|
||||
}
|
||||
|
||||
print(f" 使用代理配置: {proxy_config}")
|
||||
else:
|
||||
# 如果没有认证信息,直接使用
|
||||
proxy_config = {"server": proxy_url}
|
||||
|
||||
browser = await p.chromium.launch(headless=True, proxy=proxy_config)
|
||||
context = await browser.new_context()
|
||||
page = await context.new_page()
|
||||
|
||||
# 访问IP检测网站
|
||||
await page.goto('http://httpbin.org/ip', wait_until='networkidle', timeout=15000)
|
||||
|
||||
# 获取页面内容
|
||||
content = await page.content()
|
||||
await browser.close()
|
||||
|
||||
# 尝试解析IP
|
||||
import json
|
||||
import re
|
||||
json_match = re.search(r'\{.*\}', content, re.DOTALL)
|
||||
if json_match:
|
||||
try:
|
||||
ip_data = json.loads(json_match.group())
|
||||
return ip_data.get('origin', 'Unknown')
|
||||
except:
|
||||
print(f" JSON解析失败,原始内容: {content[:200]}...")
|
||||
return 'JSON Parse Error'
|
||||
|
||||
print(f" 未找到JSON,原始内容: {content[:200]}...")
|
||||
return 'No JSON Found'
|
||||
|
||||
except Exception as e:
|
||||
print(f" 通过Playwright+代理获取IP失败: {str(e)}")
|
||||
return f'Error: {str(e)}'
|
||||
|
||||
|
||||
async def test_proxy_formats():
|
||||
"""测试不同的代理格式"""
|
||||
print("="*60)
|
||||
print("🔍 测试不同代理格式")
|
||||
print("="*60)
|
||||
|
||||
# 从代理配置中获取代理信息
|
||||
from damai_proxy_config import get_proxy_config
|
||||
|
||||
# 获取本机IP
|
||||
print("1️⃣ 获取本机IP...")
|
||||
local_ip = await get_my_ip_requests()
|
||||
print(f" 本机IP: {local_ip}")
|
||||
|
||||
for i in range(2):
|
||||
print(f"\n2️⃣ 测试代理 {i+1}...")
|
||||
proxy_config = get_proxy_config(i)
|
||||
|
||||
print(f" 代理信息: {proxy_config}")
|
||||
|
||||
# 格式1: http://username:password@host:port
|
||||
proxy_url_format1 = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_config['server'][7:]}"
|
||||
print(f" 格式1 (完整URL): {proxy_url_format1}")
|
||||
|
||||
# 测试格式1
|
||||
ip_with_proxy1 = await get_ip_with_playwright_proxy_correct(proxy_url_format1)
|
||||
print(f" 使用格式1的IP: {ip_with_proxy1}")
|
||||
|
||||
if ip_with_proxy1 != local_ip and ip_with_proxy1 not in ['JSON Parse Error', 'No JSON Found', f'Error:']:
|
||||
print(f" ✅ 格式1成功: IP已改变,代理生效")
|
||||
else:
|
||||
print(f" ❌ 格式1失败: IP未改变或出错")
|
||||
|
||||
print()
|
||||
|
||||
|
||||
async def test_direct_proxy_config():
|
||||
"""测试直接使用代理配置对象"""
|
||||
print("="*60)
|
||||
print("🔍 测试直接使用代理配置对象")
|
||||
print("="*60)
|
||||
|
||||
# 获取本机IP
|
||||
print("1️⃣ 获取本机IP...")
|
||||
local_ip = await get_my_ip_requests()
|
||||
print(f" 本机IP: {local_ip}")
|
||||
|
||||
from damai_proxy_config import get_proxy_config
|
||||
|
||||
for i in range(2):
|
||||
print(f"\n2️⃣ 测试代理 {i+1} (直接配置)...")
|
||||
proxy_config = get_proxy_config(i)
|
||||
|
||||
# 构建Playwright代理配置对象
|
||||
playwright_proxy_config = {
|
||||
"server": proxy_config['server'],
|
||||
"username": proxy_config['username'],
|
||||
"password": proxy_config['password']
|
||||
}
|
||||
|
||||
print(f" Playwright代理配置: {playwright_proxy_config}")
|
||||
|
||||
try:
|
||||
async with async_playwright() as p:
|
||||
browser = await p.chromium.launch(headless=True, proxy=playwright_proxy_config)
|
||||
context = await browser.new_context()
|
||||
page = await context.new_page()
|
||||
|
||||
# 访问IP检测网站
|
||||
await page.goto('http://httpbin.org/ip', wait_until='networkidle', timeout=15000)
|
||||
|
||||
# 获取页面内容
|
||||
content = await page.content()
|
||||
await browser.close()
|
||||
|
||||
# 解析IP
|
||||
import json
|
||||
import re
|
||||
json_match = re.search(r'\{.*\}', content, re.DOTALL)
|
||||
if json_match:
|
||||
try:
|
||||
ip_data = json.loads(json_match.group())
|
||||
ip_address = ip_data.get('origin', 'Unknown')
|
||||
print(f" 代理{i+1} IP: {ip_address}")
|
||||
|
||||
if ip_address != local_ip:
|
||||
print(f" ✅ 代理{i+1}成功: IP已改变,代理生效")
|
||||
else:
|
||||
print(f" ❌ 代理{i+1}失败: IP未改变")
|
||||
except:
|
||||
print(f" ❌ 代理{i+1} JSON解析失败: {content[:200]}...")
|
||||
else:
|
||||
print(f" ❌ 代理{i+1} 未找到IP信息: {content[:200]}...")
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ 代理{i+1}连接失败: {str(e)}")
|
||||
|
||||
|
||||
def explain_proxy_formats():
|
||||
"""解释不同的代理格式"""
|
||||
print("="*60)
|
||||
print("📋 代理格式说明")
|
||||
print("="*60)
|
||||
|
||||
print("\n在Playwright中使用代理的两种方式:")
|
||||
print("\n1️⃣ 字典格式(推荐):")
|
||||
print(" proxy = {")
|
||||
print(" 'server': 'http://proxy-server:port',")
|
||||
print(" 'username': 'your_username',")
|
||||
print(" 'password': 'your_password'")
|
||||
print(" }")
|
||||
print(" browser = await playwright.chromium.launch(proxy=proxy)")
|
||||
|
||||
print("\n2️⃣ URL格式(包含认证信息):")
|
||||
print(" proxy_url = 'http://username:password@proxy-server:port'")
|
||||
print(" # 需要从中提取认证信息并构建字典格式")
|
||||
|
||||
print("\n⚠️ 注意:")
|
||||
print(" - 不能直接使用包含认证信息的URL字符串作为proxy.server")
|
||||
print(" - 必须将认证信息分离到单独的username和password字段")
|
||||
print(" - 代理服务器地址格式应为: http://host:port")
|
||||
|
||||
|
||||
async def main():
|
||||
"""主函数"""
|
||||
explain_proxy_formats()
|
||||
|
||||
print("\n" + "="*60)
|
||||
|
||||
# 测试直接代理配置
|
||||
await test_direct_proxy_config()
|
||||
|
||||
print("\n" + "="*60)
|
||||
|
||||
# 测试不同格式
|
||||
await test_proxy_formats()
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print("✅ 验证完成!")
|
||||
print("="*60)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import sys
|
||||
if sys.platform == 'win32':
|
||||
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
|
||||
|
||||
asyncio.run(main())
|
||||
230
backend/verify_proxy_usage.py
Normal file
230
backend/verify_proxy_usage.py
Normal file
@@ -0,0 +1,230 @@
|
||||
"""
|
||||
Playwright代理IP验证脚本
|
||||
验证Playwright浏览器是否使用了代理IP而不是本机IP
|
||||
"""
|
||||
import asyncio
|
||||
from playwright.async_api import async_playwright
|
||||
import requests
|
||||
import json
|
||||
|
||||
|
||||
async def get_my_ip_requests():
|
||||
"""使用requests获取当前IP(不使用代理)"""
|
||||
try:
|
||||
response = requests.get('http://httpbin.org/ip', timeout=10)
|
||||
if response.status_code == 200:
|
||||
data = response.json()
|
||||
return data.get('origin', 'Unknown')
|
||||
except Exception as e:
|
||||
print(f"获取本机IP失败: {str(e)}")
|
||||
return None
|
||||
|
||||
|
||||
async def get_browser_ip_via_playwright(proxy_url=None):
|
||||
"""使用Playwright获取IP,可选择是否使用代理"""
|
||||
try:
|
||||
async with async_playwright() as p:
|
||||
# 启动浏览器
|
||||
launch_kwargs = {
|
||||
"headless": True, # 无头模式
|
||||
}
|
||||
|
||||
# 如果提供了代理,则使用代理
|
||||
if proxy_url:
|
||||
launch_kwargs["proxy"] = {"server": proxy_url}
|
||||
|
||||
browser = await p.chromium.launch(**launch_kwargs)
|
||||
context = await browser.new_context()
|
||||
page = await context.new_page()
|
||||
|
||||
# 访问IP检测网站
|
||||
await page.goto('http://httpbin.org/ip', wait_until='networkidle', timeout=10000)
|
||||
|
||||
# 获取页面内容
|
||||
content = await page.content()
|
||||
|
||||
# 关闭浏览器
|
||||
await browser.close()
|
||||
|
||||
# 解析IP信息
|
||||
try:
|
||||
import re
|
||||
import json
|
||||
# 查找JSON内容
|
||||
json_match = re.search(r'\{.*\}', content, re.DOTALL)
|
||||
if json_match:
|
||||
ip_data = json.loads(json_match.group())
|
||||
return ip_data.get('origin', 'Unknown')
|
||||
except:
|
||||
pass
|
||||
|
||||
return 'Parse Error'
|
||||
|
||||
except Exception as e:
|
||||
print(f"通过Playwright获取IP失败: {str(e)}")
|
||||
return None
|
||||
|
||||
|
||||
async def verify_proxy_usage():
|
||||
"""验证代理IP使用情况"""
|
||||
print("="*60)
|
||||
print("🔍 Playwright代理IP使用验证")
|
||||
print("="*60)
|
||||
|
||||
# 1. 获取本机IP
|
||||
print("\n1️⃣ 获取本机IP地址...")
|
||||
local_ip = await get_my_ip_requests()
|
||||
if local_ip:
|
||||
print(f" ✅ 本机IP: {local_ip}")
|
||||
else:
|
||||
print(" ❌ 无法获取本机IP")
|
||||
return
|
||||
|
||||
# 2. 测试不使用代理时的IP
|
||||
print("\n2️⃣ 测试不使用代理时的IP...")
|
||||
browser_ip_no_proxy = await get_browser_ip_via_playwright()
|
||||
print(f" 🌐 Playwright无代理IP: {browser_ip_no_proxy}")
|
||||
|
||||
# 3. 测试使用代理时的IP
|
||||
print("\n3️⃣ 测试使用代理时的IP...")
|
||||
|
||||
# 从代理配置中获取代理信息
|
||||
from damai_proxy_config import get_proxy_config
|
||||
|
||||
for i in range(2):
|
||||
try:
|
||||
proxy_config = get_proxy_config(i)
|
||||
proxy_server = proxy_config['server'].replace('http://', '')
|
||||
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
|
||||
|
||||
print(f" 代理{i+1}: {proxy_config['server']}")
|
||||
|
||||
# 获取使用代理时的IP
|
||||
browser_ip_with_proxy = await get_browser_ip_via_playwright(proxy_url)
|
||||
print(f" 🌐 Playwright使用代理{i+1}的IP: {browser_ip_with_proxy}")
|
||||
|
||||
# 比较IP地址
|
||||
if browser_ip_with_proxy == local_ip:
|
||||
print(f" ❌ 代理{i+1}测试失败: IP与本机IP相同,代理未生效")
|
||||
elif browser_ip_with_proxy == proxy_server.split(':')[0]: # 检查是否是代理服务器IP
|
||||
print(f" ✅ 代理{i+1}测试成功: 使用了代理IP")
|
||||
elif browser_ip_with_proxy != 'Parse Error' and browser_ip_with_proxy != local_ip:
|
||||
print(f" ✅ 代理{i+1}测试成功: IP已改变,代理生效")
|
||||
else:
|
||||
print(f" ⚠️ 代理{i+1}测试结果不确定: {browser_ip_with_proxy}")
|
||||
|
||||
except Exception as e:
|
||||
print(f" ❌ 代理{i+1}测试出错: {str(e)}")
|
||||
|
||||
print() # 空行分隔
|
||||
|
||||
|
||||
async def advanced_proxy_verification():
|
||||
"""高级代理验证 - 使用多个IP检测服务"""
|
||||
print("="*60)
|
||||
print("🔬 高级代理IP验证")
|
||||
print("="*60)
|
||||
|
||||
# IP检测服务列表
|
||||
ip_services = [
|
||||
'http://httpbin.org/ip',
|
||||
'https://api.ipify.org?format=json',
|
||||
'https://jsonip.com',
|
||||
'https://httpbin.org/ip'
|
||||
]
|
||||
|
||||
from damai_proxy_config import get_proxy_config
|
||||
|
||||
for i in range(2):
|
||||
try:
|
||||
proxy_config = get_proxy_config(i)
|
||||
proxy_server = proxy_config['server'].replace('http://', '')
|
||||
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
|
||||
|
||||
print(f"\n📊 验证代理 {i+1}: {proxy_config['server']}")
|
||||
print("-" * 50)
|
||||
|
||||
async with async_playwright() as p:
|
||||
launch_kwargs = {"headless": True, "proxy": {"server": proxy_url}}
|
||||
browser = await p.chromium.launch(**launch_kwargs)
|
||||
context = await browser.new_context()
|
||||
page = await context.new_page()
|
||||
|
||||
for service in ip_services:
|
||||
try:
|
||||
print(f" 正在测试: {service}")
|
||||
await page.goto(service, wait_until='networkidle', timeout=10000)
|
||||
content = await page.content()
|
||||
|
||||
# 尝试解析IP
|
||||
import re
|
||||
import json
|
||||
json_match = re.search(r'\{.*\}', content, re.DOTALL)
|
||||
if json_match:
|
||||
try:
|
||||
data = json.loads(json_match.group())
|
||||
ip = data.get('origin') or data.get('ip') or 'Unknown'
|
||||
print(f" ✅ {service}: {ip}")
|
||||
except:
|
||||
print(f" ❌ {service}: JSON解析失败")
|
||||
else:
|
||||
print(f" ❌ {service}: 未找到JSON数据")
|
||||
except Exception as e:
|
||||
print(f" ❌ {service}: {str(e)}")
|
||||
|
||||
await browser.close()
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 代理{i+1}高级验证失败: {str(e)}")
|
||||
|
||||
|
||||
def show_proxy_format():
|
||||
"""显示代理格式"""
|
||||
print("="*60)
|
||||
print("🔧 Playwright代理格式参考")
|
||||
print("="*60)
|
||||
|
||||
from damai_proxy_config import get_proxy_config
|
||||
|
||||
for i in range(2):
|
||||
proxy_config = get_proxy_config(i)
|
||||
proxy_server = proxy_config['server'].replace('http://', '')
|
||||
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
|
||||
|
||||
print(f"\n代理{i+1}:")
|
||||
print(f" 原始地址: {proxy_config['server']}")
|
||||
print(f" 用户名: {proxy_config['username']}")
|
||||
print(f" 密码: {proxy_config['password']}")
|
||||
print(f" Playwright格式: {proxy_url}")
|
||||
print(f" 使用示例:")
|
||||
print(f" browser = await playwright.chromium.launch(")
|
||||
print(f" proxy={{'server': '{proxy_url}'}}")
|
||||
print(f" )")
|
||||
|
||||
|
||||
async def main():
|
||||
"""主函数"""
|
||||
# 显示代理格式
|
||||
show_proxy_format()
|
||||
|
||||
print("\n" + "="*60)
|
||||
|
||||
# 基础验证
|
||||
await verify_proxy_usage()
|
||||
|
||||
# 高级验证(可选,可能会比较耗时)
|
||||
user_input = input("\n是否进行高级验证? 这将测试多个IP服务 (y/N): ")
|
||||
if user_input.lower() == 'y':
|
||||
await advanced_proxy_verification()
|
||||
|
||||
print(f"\n{'='*60}")
|
||||
print("✅ 验证完成!")
|
||||
print("="*60)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import sys
|
||||
if sys.platform == 'win32':
|
||||
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
|
||||
|
||||
asyncio.run(main())
|
||||
1806
backend/xhs_login.py
1806
backend/xhs_login.py
File diff suppressed because it is too large
Load Diff
@@ -181,9 +181,28 @@ class XHSPublishService:
|
||||
|
||||
local_images = []
|
||||
|
||||
# OSS域名前缀(用于补充不完整的图片路径)
|
||||
oss_prefix = "https://bxmkb-beijing.oss-cn-beijing.aliyuncs.com/Images/"
|
||||
|
||||
print(f"\n正在处理 {len(images)} 张图片...", file=sys.stderr)
|
||||
|
||||
for i, img in enumerate(images):
|
||||
# 检查是否需要补充OSS前缀
|
||||
original_img = img
|
||||
print(f" [调试] 处理图片 {i+1}: '{img}'", file=sys.stderr)
|
||||
print(f" [调试] is_url={self.is_url(img)}, isabs={os.path.isabs(img)}", file=sys.stderr)
|
||||
|
||||
if not self.is_url(img) and not os.path.isabs(img):
|
||||
# 不是URL也不是绝对路径,检查是否需要补充OSS前缀
|
||||
print(f" [调试] 不是URL也不是绝对路径", file=sys.stderr)
|
||||
# 如果路径不包含协议且不以/开头,可能是相对OSS路径
|
||||
if '/' in img and not img.startswith('/'):
|
||||
# 可能是OSS相对路径,补充前缀
|
||||
img = oss_prefix + img
|
||||
print(f" ✅ 检测到相对路径,补充OSS前缀: {original_img} -> {img}", file=sys.stderr)
|
||||
else:
|
||||
print(f" [调试] 不满足补充条件: '/' in img={('/' in img)}, not startswith('/')={not img.startswith('/')}", file=sys.stderr)
|
||||
|
||||
if self.is_url(img):
|
||||
# 网络URL,需要下载
|
||||
try:
|
||||
@@ -195,9 +214,25 @@ class XHSPublishService:
|
||||
continue
|
||||
else:
|
||||
# 本地路径
|
||||
if os.path.exists(img):
|
||||
local_images.append(os.path.abspath(img))
|
||||
print(f" ✅ 本地图片 [{i + 1}]: {os.path.basename(img)}", file=sys.stderr)
|
||||
# 先尝试直接使用,如果不存在则尝试相对路径
|
||||
abs_path = None
|
||||
|
||||
# 1. 尝试作为绝对路径
|
||||
if os.path.isabs(img) and os.path.exists(img):
|
||||
abs_path = img
|
||||
# 2. 尝试相对于当前工作目录
|
||||
elif os.path.exists(img):
|
||||
abs_path = os.path.abspath(img)
|
||||
# 3. 尝试相对于 static 目录
|
||||
elif os.path.exists(os.path.join('static', img)):
|
||||
abs_path = os.path.abspath(os.path.join('static', img))
|
||||
# 4. 尝试相对于 ../go_backend/static 目录
|
||||
elif os.path.exists(os.path.join('..', 'go_backend', 'static', img)):
|
||||
abs_path = os.path.abspath(os.path.join('..', 'go_backend', 'static', img))
|
||||
|
||||
if abs_path:
|
||||
local_images.append(abs_path)
|
||||
print(f" ✅ 本地图片 [{i + 1}]: {os.path.basename(abs_path)} ({abs_path})", file=sys.stderr)
|
||||
else:
|
||||
print(f" ⚠️ 本地图片不存在: {img}", file=sys.stderr)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user