This commit is contained in:
sjk
2026-01-06 19:36:42 +08:00
parent 15b579d64a
commit 19942144fb
261 changed files with 24034 additions and 5477 deletions

25
backend/.env.example Normal file
View File

@@ -0,0 +1,25 @@
# Python服务环境变量配置示例
# 复制此文件为 .env 并根据需要修改
# ========== 运行环境 ==========
# 可选值: dev, prod
# 默认: dev
ENV=dev
# ========== 可选:覆盖配置文件中的数据库配置 ==========
# 如果设置了以下环境变量,将覆盖 config.{ENV}.yaml 中的对应配置
# DB_HOST=localhost
# DB_PORT=3306
# DB_USER=root
# DB_PASSWORD=your_password
# DB_NAME=ai_wht
# ========== 可选:覆盖调度器配置 ==========
# SCHEDULER_ENABLED=true
# SCHEDULER_CRON=*/5 * * * * *
# SCHEDULER_MAX_CONCURRENT=2
# SCHEDULER_PUBLISH_TIMEOUT=300
# SCHEDULER_MAX_ARTICLES_PER_USER_PER_RUN=2
# SCHEDULER_MAX_FAILURES_PER_USER_PER_RUN=3
# SCHEDULER_MAX_DAILY_ARTICLES_PER_USER=6
# SCHEDULER_MAX_HOURLY_ARTICLES_PER_USER=2

112
backend/CONFIG_GUIDE.md Normal file
View File

@@ -0,0 +1,112 @@
# Python服务配置说明
## 配置文件结构
Python服务现在使用与Go服务相同的配置文件结构
```
backend/
├── config.dev.yaml # 开发环境配置
├── config.prod.yaml # 生产环境配置
├── config.py # 配置加载模块
├── .env.example # 环境变量示例
└── .env # 环境变量需手动创建Git忽略
```
## 环境切换
通过设置 `ENV` 环境变量来切换环境:
### Windows (CMD)
```bash
set ENV=dev
python main.py
```
### Windows (PowerShell)
```powershell
$env:ENV="dev"
python main.py
```
### Linux/Mac
```bash
ENV=dev python main.py
```
或者在 `.env` 文件中设置:
```
ENV=dev
```
## 配置优先级
1. **环境变量** - 最高优先级
2. **配置文件** - config.{ENV}.yaml
3. **代码默认值** - 最低优先级
## 配置项说明
### 开发环境 (config.dev.yaml)
- **数据库**: 本地MySQL (localhost:3306)
- **调度器**: 启用每5秒执行一次测试用
- **日志级别**: DEBUG
### 生产环境 (config.prod.yaml)
- **数据库**: 远程MySQL (8.149.233.36:3306)
- **调度器**: 启用每5分钟执行一次
- **日志级别**: INFO
## 使用示例
### 1. 开发环境
创建 `.env` 文件:
```bash
ENV=dev
```
启动服务:
```bash
python main.py
```
### 2. 生产环境
创建 `.env` 文件:
```bash
ENV=prod
```
启动服务:
```bash
python main.py
```
### 3. 覆盖配置
如需临时修改某些配置,可在 `.env` 中添加:
```bash
ENV=dev
DB_HOST=192.168.1.100
SCHEDULER_CRON=0 */10 * * * *
```
## 与Go服务的配置对应关系
| Python配置 | Go配置 | 说明 |
|-----------|--------|------|
| config.dev.yaml | config/config.dev.yaml | 开发环境配置 |
| config.prod.yaml | config/config.prod.yaml | 生产环境配置 |
| ENV环境变量 | ENV环境变量 | 环境切换 |
| database.username | database.username | 数据库用户名 |
| database.dbname | database.dbname | 数据库名称 |
## 注意事项
1. **密码安全**: 生产环境请修改 `config.prod.yaml` 中的数据库密码
2. **Git忽略**: `.env` 文件已被Git忽略不会提交到代码库
3. **环境变量**: 环境变量会覆盖配置文件中的同名配置
4. **调度器频率**: 开发环境默认5秒执行一次生产环境默认5分钟执行一次

View File

@@ -0,0 +1,266 @@
# 大麦固定代理IP使用指南
## 📋 概述
本项目已集成两个大麦固定代理IP可用于无头浏览器访问支持完整的HTTP认证。
## 🌐 代理配置
### 代理1
- **服务器**: `36.137.177.131:50001`
- **用户名**: `qqwvy0`
- **密码**: `mun3r7xz`
- **状态**: ✅ 已测试可用
### 代理2
- **服务器**: `111.132.40.72:50002`
- **用户名**: `ih3z07`
- **密码**: `078bt7o5`
- **状态**: ✅ 已测试可用
## 📂 相关文件
| 文件名 | 说明 |
|--------|------|
| `damai_proxy_config.py` | 代理配置管理模块 |
| `test_damai_proxy.py` | 代理测试脚本 |
| `example_use_damai_proxy.py` | 使用示例代码 |
## 🚀 快速开始
### 1. 测试代理可用性
```bash
# 测试所有代理
python test_damai_proxy.py
# 测试单个代理
python test_damai_proxy.py 0 # 测试代理1
python test_damai_proxy.py 1 # 测试代理2
```
### 2. 在代码中使用
#### 方式一:使用配置模块
```python
from damai_proxy_config import get_proxy_1, get_proxy_2, get_random_proxy
# 获取指定代理
proxy = get_proxy_1() # 或 get_proxy_2()
# 随机获取代理
proxy = get_random_proxy()
print(proxy)
# 输出: {'server': 'http://...', 'username': '...', 'password': '...'}
```
#### 方式二在Playwright中使用
```python
from playwright.async_api import async_playwright
from damai_proxy_config import get_proxy_1
async def use_proxy():
proxy_config = get_proxy_1()
playwright = await async_playwright().start()
# 配置代理(含认证)
browser = await playwright.chromium.launch(
headless=True,
proxy={
"server": proxy_config["server"],
"username": proxy_config["username"],
"password": proxy_config["password"]
}
)
context = await browser.new_context()
page = await context.new_page()
# 访问目标网站
await page.goto("https://www.damai.cn/")
await browser.close()
await playwright.stop()
```
#### 方式三集成到browser_pool
```python
from browser_pool import get_browser_pool
from damai_proxy_config import get_random_proxy
async def use_with_pool():
# 获取代理配置
proxy = get_random_proxy()
# 注意当前browser_pool需要修改以支持带认证的代理
pool = get_browser_pool()
browser, context, page = await pool.get_browser(
proxy=f"{proxy['server']}" # 基础用法
)
```
## 🔧 API文档
### damai_proxy_config.py
#### `get_proxy_config(index: int) -> dict`
获取指定索引的代理配置
**参数:**
- `index`: 代理索引0或1
**返回:**
```python
{
"server": "http://...",
"username": "...",
"password": "..."
}
```
#### `get_proxy_1() -> dict`
快捷获取代理1配置
#### `get_proxy_2() -> dict`
快捷获取代理2配置
#### `get_random_proxy() -> dict`
随机获取一个可用代理
#### `get_all_enabled_proxies() -> list`
获取所有已启用的代理列表
## ✅ 测试结果
所有代理已通过以下测试:
1.**IP检测测试** - 确认代理IP地址正确
2.**小红书访问测试** - 成功访问小红书创作平台
3.**大麦网访问测试** - 成功访问大麦网
### 测试日志示例
```
🔍 开始测试: 大麦代理1
代理服务器: http://36.137.177.131:50001
认证信息: qqwvy0 / mun3r7xz
============================================================
✅ Playwright启动成功
✅ 浏览器启动成功
✅ 浏览器上下文创建成功
✅ 页面创建成功
📍 测试1: 访问IP检测网站...
✅ 访问成功
🌐 当前IP信息:
{
"origin": "36.137.177.131"
}
📍 测试2: 访问小红书登录页...
✅ 访问成功
页面标题: 小红书创作服务平台
📍 测试3: 访问大麦网...
✅ 访问成功
页面标题: 大麦网-全球演出赛事官方购票平台
```
## 🎯 使用场景
1. **反爬虫绕过** - 使用固定IP避免频繁更换导致的风险
2. **地域限制** - 使用特定地区的IP访问区域性内容
3. **负载均衡** - 在多个代理间轮换,分散请求压力
4. **容错处理** - 一个代理失败时自动切换到备用代理
## ⚠️ 注意事项
1. **认证信息安全**: 代理用户名密码已配置在代码中,生产环境建议使用环境变量
2. **代理轮换**: 建议实现代理轮换机制避免单一IP被封禁
3. **异常处理**: 建议添加代理失败时的重试和切换逻辑
4. **性能影响**: 使用代理会增加网络延迟,请根据实际需求权衡
## 🔄 代理管理
### 启用/禁用代理
编辑 `damai_proxy_config.py`,修改代理配置中的 `enabled` 字段:
```python
DAMAI_PROXY_POOL = [
{
"name": "大麦代理1",
"server": "http://36.137.177.131:50001",
"username": "qqwvy0",
"password": "mun3r7xz",
"enabled": True # 设置为False禁用此代理
},
# ...
]
```
### 添加新代理
`DAMAI_PROXY_POOL` 列表中添加新的代理配置:
```python
{
"name": "新代理",
"server": "http://ip:port",
"username": "username",
"password": "password",
"enabled": True
}
```
## 📊 性能测试
根据测试结果,代理响应时间:
- IP检测: ~2-3秒
- 小红书: ~3-5秒
- 大麦网: ~3-5秒
## 🛠️ 故障排查
### 问题1: 代理连接超时
**解决方案**:
1. 检查代理服务器是否在线
2. 验证认证信息是否正确
3. 增加连接超时时间
### 问题2: 认证失败
**解决方案**:
1. 确认用户名密码正确
2. 检查代理是否需要IP白名单
3. 联系代理服务商确认账户状态
### 问题3: 访问被拒绝
**解决方案**:
1. 切换到另一个代理
2. 检查目标网站是否封禁了代理IP
3. 添加适当的请求头和延迟
## 📝 更新日志
### 2025-12-26
- ✅ 初始化大麦代理配置
- ✅ 完成两个代理的测试验证
- ✅ 创建配置管理模块
- ✅ 添加使用示例和文档
## 📞 技术支持
如遇到代理相关问题,请检查:
1. 网络连接是否正常
2. 代理服务商是否有公告
3. 代理配置是否正确
---
**最后更新**: 2025-12-26
**版本**: 1.0.0

View File

@@ -0,0 +1,108 @@
# 登录页面配置功能说明
## 功能概述
现在可以通过配置文件来控制小红书登录时获取Cookie的来源页面支持两种选项
- **creator**: 创作者中心 (https://creator.xiaohongshu.com/login)
- **home**: 小红书首页 (https://www.xiaohongshu.com)
## 配置方法
### 1. 修改配置文件
`config.dev.yaml``config.prod.yaml` 中找到 `login` 配置节:
```yaml
# ========== 登录/绑定功能配置 ==========
login:
headless: false # 登录/绑定时的浏览器模式
page: "creator" # 登录页面类型: creator 或 home
```
`page` 的值修改为你想要的登录页面:
- `"creator"`: 使用创作者中心登录页
- `"home"`: 使用小红书首页登录
### 2. 重启服务
修改配置后需要重启Python后端服务使配置生效
```bash
# Windows
cd backend
.\start.bat
# Linux
cd backend
./start.sh
```
## API参数覆盖
即使配置了默认值API请求仍然可以通过 `login_page` 参数临时覆盖配置:
```javascript
// 发送验证码
POST /api/xhs/send-code
{
"phone": "13800138000",
"country_code": "+86",
"login_page": "home" // 可选,不传则使用配置文件默认值
}
// 登录
POST /api/xhs/login
{
"phone": "13800138000",
"code": "123456",
"country_code": "+86",
"login_page": "home", // 可选,不传则使用配置文件默认值
"session_id": "xxx"
}
```
## 优先级说明
1. **最高优先级**: API请求中的 `login_page` 参数
2. **默认值**: 配置文件中的 `login.page` 配置
3. **兜底值**: 如果都未配置,默认使用 `creator`
## 测试验证
运行测试脚本验证配置是否正确:
```bash
cd backend
python test_login_page_config.py
```
## 配置影响范围
修改 `login.page` 配置会影响以下功能:
1. **发送验证码接口** (`/api/xhs/send-code`)
2. **登录接口** (`/api/xhs/login`)
3. **浏览器池预热URL** (根据配置自动调整)
## 注意事项
1. 两个登录页面的HTML结构可能略有不同如遇到问题请切换尝试
2. 建议在开发环境先测试再应用到生产环境
3. 配置修改后需要重启服务才能生效
4. 如果API明确传入了 `login_page` 参数会优先使用API参数而不是配置文件
## 示例场景
### 场景1全局使用创作者中心
```yaml
login:
page: "creator"
```
不传API参数时所有请求都使用创作者中心登录。
### 场景2全局使用首页但个别请求使用创作者中心
```yaml
login:
page: "home"
```
大部分请求使用首页但特殊情况下API可以传 `"login_page": "creator"` 临时切换。

203
backend/ali_sms_service.py Normal file
View File

@@ -0,0 +1,203 @@
"""
阿里云短信服务模块
用于发送手机验证码
"""
import json
import random
import sys
from typing import Dict, Any, Optional
from datetime import datetime, timedelta
from alibabacloud_dysmsapi20170525.client import Client as Dysmsapi20170525Client
from alibabacloud_credentials.client import Client as CredentialClient
from alibabacloud_credentials.models import Config as CredentialConfig
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_dysmsapi20170525 import models as dysmsapi_20170525_models
from alibabacloud_tea_util import models as util_models
class AliSmsService:
"""阿里云短信服务"""
def __init__(self, access_key_id: str, access_key_secret: str, sign_name: str, template_code: str):
"""
初始化阿里云短信服务
Args:
access_key_id: 阿里云AccessKey ID
access_key_secret: 阿里云AccessKey Secret
sign_name: 短信签名
template_code: 短信模板CODE
"""
self.sign_name = sign_name
self.template_code = template_code
# 创建阿里云短信客户端
credential_config = CredentialConfig(
type='access_key',
access_key_id=access_key_id,
access_key_secret=access_key_secret
)
credential = CredentialClient(credential_config)
config = open_api_models.Config(credential=credential)
config.endpoint = 'dysmsapi.aliyuncs.com'
self.client = Dysmsapi20170525Client(config)
# 验证码缓存简单内存存储生产环境应使用Redis
self._code_cache: Dict[str, Dict[str, Any]] = {}
# 验证码配置
self.code_length = 6 # 验证码长度
self.code_expire_minutes = 5 # 验证码过期时间(分钟)
def _generate_code(self) -> str:
"""生成随机验证码"""
return ''.join([str(random.randint(0, 9)) for _ in range(self.code_length)])
async def send_verification_code(self, phone: str) -> Dict[str, Any]:
"""
发送验证码到指定手机号
Args:
phone: 手机号
Returns:
Dict containing success status and error message if any
"""
try:
# 生成验证码
code = self._generate_code()
print(f"[短信服务] 正在发送验证码到 {phone},验证码: {code}", file=sys.stderr)
# 构建短信请求
send_sms_request = dysmsapi_20170525_models.SendSmsRequest(
phone_numbers=phone,
sign_name=self.sign_name,
template_code=self.template_code,
template_param=json.dumps({"code": code})
)
runtime = util_models.RuntimeOptions()
# 发送短信
try:
resp = self.client.send_sms_with_options(send_sms_request, runtime)
resp_dict = resp.to_map()
print(f"[短信服务] 阿里云响应: {json.dumps(resp_dict, default=str, indent=2, ensure_ascii=False)}", file=sys.stderr)
# 检查发送结果
if resp_dict.get('body', {}).get('Code') == 'OK':
# 缓存验证码
self._code_cache[phone] = {
'code': code,
'expire_time': datetime.now() + timedelta(minutes=self.code_expire_minutes),
'sent_at': datetime.now()
}
print(f"[短信服务] 验证码发送成功,手机号: {phone}", file=sys.stderr)
return {
"success": True,
"message": f"验证码已发送,{self.code_expire_minutes}分钟内有效",
"code": code # 开发环境返回验证码,生产环境应移除
}
else:
error_msg = resp_dict.get('body', {}).get('Message', '未知错误')
print(f"[短信服务] 发送失败: {error_msg}", file=sys.stderr)
return {
"success": False,
"error": f"短信发送失败: {error_msg}"
}
except Exception as e:
error_msg = str(e)
print(f"[短信服务] 发送异常: {error_msg}", file=sys.stderr)
# 如果有诊断地址,打印出来
if hasattr(e, 'data') and e.data:
recommend = e.data.get('Recommend')
if recommend:
print(f"[短信服务] 诊断地址: {recommend}", file=sys.stderr)
return {
"success": False,
"error": f"短信发送异常: {error_msg}"
}
except Exception as e:
print(f"[短信服务] 发送验证码失败: {str(e)}", file=sys.stderr)
return {
"success": False,
"error": str(e)
}
def verify_code(self, phone: str, code: str) -> Dict[str, Any]:
"""
验证手机号和验证码
Args:
phone: 手机号
code: 用户输入的验证码
Returns:
Dict containing verification result
"""
try:
# 检查验证码是否存在
if phone not in self._code_cache:
return {
"success": False,
"error": "验证码未发送或已过期,请重新获取"
}
cached_data = self._code_cache[phone]
# 检查是否过期
if datetime.now() > cached_data['expire_time']:
# 删除过期验证码
del self._code_cache[phone]
return {
"success": False,
"error": "验证码已过期,请重新获取"
}
# 验证码匹配
if code == cached_data['code']:
# 验证成功后删除验证码(一次性使用)
del self._code_cache[phone]
print(f"[短信服务] 验证码验证成功,手机号: {phone}", file=sys.stderr)
return {
"success": True,
"message": "验证码验证成功"
}
else:
return {
"success": False,
"error": "验证码错误,请重新输入"
}
except Exception as e:
print(f"[短信服务] 验证码验证失败: {str(e)}", file=sys.stderr)
return {
"success": False,
"error": str(e)
}
def cleanup_expired_codes(self):
"""清理过期的验证码"""
current_time = datetime.now()
expired_phones = [
phone for phone, data in self._code_cache.items()
if current_time > data['expire_time']
]
for phone in expired_phones:
del self._code_cache[phone]
if expired_phones:
print(f"[短信服务] 已清理 {len(expired_phones)} 个过期验证码", file=sys.stderr)

553
backend/browser_pool.py Normal file
View File

@@ -0,0 +1,553 @@
"""
浏览器池管理模块
管理Playwright浏览器实例的生命周期支持复用以提升性能
"""
import asyncio
import time
from typing import Optional, Dict, Any
from playwright.async_api import async_playwright, Browser, BrowserContext, Page
import sys
class BrowserPool:
"""浏览器池管理器(单例模式)"""
def __init__(self, idle_timeout: int = 1800, max_instances: int = 5, headless: bool = True):
"""
初始化浏览器池
Args:
idle_timeout: 空闲超时时间默认30分钟已禁用保持常驻
max_instances: 最大浏览器实例数默认5个
headless: 是否使用无头模式False为有头模式方便调试
"""
self.playwright = None
self.browser: Optional[Browser] = None
self.context: Optional[BrowserContext] = None
self.page: Optional[Page] = None
self.last_used_time = 0
self.idle_timeout = idle_timeout
self.max_instances = max_instances
self.headless = headless
self.is_initializing = False
self.init_lock = asyncio.Lock()
self.is_preheated = False # 标记是否已预热
# 临时浏览器实例池(用于并发请求)
self.temp_browsers: Dict[str, Dict] = {} # {session_id: {browser, context, page, created_at}}
self.temp_lock = asyncio.Lock()
print(f"[浏览器池] 已创建,常驻模式(不自动清理),最大实例数: {max_instances}", file=sys.stderr)
async def get_browser(self, cookies: Optional[list] = None, proxy: Optional[str] = None,
user_agent: Optional[str] = None, session_id: Optional[str] = None,
headless: Optional[bool] = None) -> tuple[Browser, BrowserContext, Page]:
"""
获取浏览器实例(复用或新建)
Args:
cookies: 可选的Cookie列表
proxy: 可选的代理地址
user_agent: 可选的自定义User-Agent
session_id: 会话 ID用于区分不同的并发请求
headless: 可选的headless模式为None时使用默认配置
Returns:
(browser, context, page) 三元组
"""
# 如果没有指定headless使用默认配置
if headless is None:
headless = self.headless
# 如果主浏览器可用且无会话 ID使用主浏览器
if not session_id:
async with self.init_lock:
# 检查现有浏览器是否可用
if await self._is_browser_alive():
print("[浏览器池] 复用主浏览器实例", file=sys.stderr)
self.last_used_time = time.time()
# 如果需要注入Cookie直接添加到现有的context不创建新context
if cookies:
print(f"[浏览器池] 在现有context中注入 {len(cookies)} 个Cookie", file=sys.stderr)
await self.context.add_cookies(cookies)
return self.browser, self.context, self.page
else:
# 创建新浏览器
print("[浏览器池] 创建主浏览器实例", file=sys.stderr)
await self._init_browser(cookies, proxy, user_agent)
self.last_used_time = time.time()
return self.browser, self.context, self.page
# 并发请求:复用或创建临时浏览器
else:
async with self.temp_lock:
# 首先检查是否已存在该session_id的临时浏览器
if session_id in self.temp_browsers:
print(f"[浏览器池] 复用会话 {session_id} 的临时浏览器", file=sys.stderr)
browser_info = self.temp_browsers[session_id]
return browser_info["browser"], browser_info["context"], browser_info["page"]
# 检查是否超过最大实例数
if len(self.temp_browsers) >= self.max_instances - 1: # -1 留给主浏览器
print(f"[浏览器池] ⚠️ 已达最大实例数 ({self.max_instances}),等待释放...", file=sys.stderr)
# TODO: 可以实现等待队列,这里直接报错
raise Exception(f"浏览器实例数已满,请稍后再试")
print(f"[浏览器池] 为会话 {session_id} 创建临时浏览器 ({len(self.temp_browsers)+1}/{self.max_instances-1})", file=sys.stderr)
# 创建临时浏览器传入headless参数
browser, context, page = await self._create_temp_browser(cookies, proxy, user_agent, headless)
# 保存到临时池
self.temp_browsers[session_id] = {
"browser": browser,
"context": context,
"page": page,
"created_at": time.time()
}
return browser, context, page
async def _is_browser_alive(self) -> bool:
"""检查浏览器是否存活(不检查超时,保持常驻)"""
if not self.browser or not self.context or not self.page:
return False
# 注意:为了保持浏览器常驻,不再检查空闲超时
# 原代码:
# if time.time() - self.last_used_time > self.idle_timeout:
# print(f"[浏览器池] 浏览器空闲超时 ({self.idle_timeout}秒),需要重建", file=sys.stderr)
# await self.close()
# return False
# 检查浏览器是否仍在运行
try:
# 尝试获取页面标题来验证连接
await self.page.title()
return True
except Exception as e:
print(f"[浏览器池] 浏览器连接失效: {str(e)}", file=sys.stderr)
await self.close()
return False
async def _init_browser(self, cookies: Optional[list] = None, proxy: Optional[str] = None,
user_agent: Optional[str] = None):
"""初始化新浏览器实例"""
try:
# 启动Playwright
if not self.playwright:
# Windows环境下需要设置事件循环策略
if sys.platform == 'win32':
# 设置为ProactorEventLoop或SelectorEventLoop
try:
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
except Exception as e:
print(f"[浏览器池] 警告: 设置事件循环策略失败: {str(e)}", file=sys.stderr)
self.playwright = await async_playwright().start()
print("[浏览器池] Playwright启动成功", file=sys.stderr)
# 启动浏览器(性能优先配置)
launch_kwargs = {
"headless": self.headless, # 使用配置的headless参数
"args": [
'--disable-blink-features=AutomationControlled', # 隐藏自动化特征
'--no-sandbox', # Linux环境必需
'--disable-setuid-sandbox',
'--disable-dev-shm-usage', # 使用/tmp而非/dev/shm避免内存不足
# 性能优化
'--disable-web-security', # 禁用同源策略(提升加载速度)
'--disable-features=IsolateOrigins,site-per-process', # 禁用站点隔离(提升性能)
'--disable-site-isolation-trials',
'--enable-features=NetworkService,NetworkServiceInProcess', # 网络服务优化
'--disable-background-timer-throttling', # 禁用后台限速
'--disable-backgrounding-occluded-windows',
'--disable-renderer-backgrounding', # 渲染进程不降优先级
'--disable-background-networking',
# 缓存和存储优化
'--disk-cache-size=268435456', # 256MB磁盘缓存
'--media-cache-size=134217728', # 128MB媒体缓存
# 渲染优化保留GPU支持
'--enable-gpu-rasterization', # 启用GPU光栅化
'--enable-zero-copy', # 零拷贝优化
'--ignore-gpu-blocklist', # 忽略GPU黑名单
'--enable-accelerated-2d-canvas', # 加速2D canvas
# 网络优化
'--enable-quic', # 启用QUIC协议
'--enable-tcp-fast-open', # TCP快速打开
'--max-connections-per-host=10', # 每个主机最大连接数
# 减少不必要的功能
'--disable-extensions',
'--disable-breakpad', # 禁用崩溃报告
'--disable-component-extensions-with-background-pages',
'--disable-ipc-flooding-protection', # 禁用IPC洪水保护提升性能
'--disable-hang-monitor', # 禁用挂起监控
'--disable-prompt-on-repost',
'--disable-domain-reliability',
'--disable-component-update',
# 界面优化
'--hide-scrollbars',
'--mute-audio',
'--no-first-run',
'--no-default-browser-check',
'--metrics-recording-only',
'--force-color-profile=srgb',
],
}
if proxy:
launch_kwargs["proxy"] = {"server": proxy}
self.browser = await self.playwright.chromium.launch(**launch_kwargs)
print("[浏览器池] Chromium浏览器启动成功", file=sys.stderr)
# 创建上下文
await self._create_new_context(cookies, proxy, user_agent)
except Exception as e:
print(f"[浏览器池] 初始化浏览器失败: {str(e)}", file=sys.stderr)
await self.close()
raise
async def _create_new_context(self, cookies: Optional[list] = None, proxy: Optional[str] = None,
user_agent: Optional[str] = None):
"""创建新的浏览器上下文"""
try:
# 关闭旧上下文
if self.context:
await self.context.close()
print("[浏览器池] 已关闭旧上下文", file=sys.stderr)
# 创建新上下文
context_kwargs = {
"viewport": {'width': 1280, 'height': 720},
"user_agent": user_agent or 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
}
self.context = await self.browser.new_context(**context_kwargs)
# 注入Cookie
if cookies:
await self.context.add_cookies(cookies)
print(f"[浏览器池] 已注入 {len(cookies)} 个Cookie", file=sys.stderr)
# 创建页面
self.page = await self.context.new_page()
print("[浏览器池] 新页面创建成功", file=sys.stderr)
except Exception as e:
print(f"[浏览器池] 创建上下文失败: {str(e)}", file=sys.stderr)
raise
async def close(self):
"""关闭浏览器池"""
try:
if self.page:
await self.page.close()
self.page = None
if self.context:
await self.context.close()
self.context = None
if self.browser:
await self.browser.close()
self.browser = None
if self.playwright:
await self.playwright.stop()
self.playwright = None
print("[浏览器池] 浏览器已关闭", file=sys.stderr)
except Exception as e:
print(f"[浏览器池] 关闭浏览器异常: {str(e)}", file=sys.stderr)
async def cleanup_if_idle(self):
"""清理空闲浏览器(定时任务调用)- 已禁用,保持常驻"""
# 注意:为了保持浏览器常驻,不再自动清理
# 原代码:
# if self.browser and time.time() - self.last_used_time > self.idle_timeout:
# print(f"[浏览器池] 检测到空闲超时,自动清理浏览器", file=sys.stderr)
# await self.close()
pass # 不再执行清理操作
async def preheat(self, target_url: str = "https://creator.xiaohongshu.com/login"):
"""
预热浏览器:提前初始化并访问目标页面
Args:
target_url: 预热目标页面,默认为小红书登录页
"""
try:
print("[浏览器预热] 开始预热浏览器...", file=sys.stderr)
# 初始化浏览器
await self._init_browser()
self.last_used_time = time.time()
# 访问目标页面
print(f"[浏览器预热] 正在访问: {target_url}", file=sys.stderr)
await self.page.goto(target_url, wait_until='domcontentloaded', timeout=45000)
# 等待页面完全加载
await asyncio.sleep(1)
self.is_preheated = True
print("[浏览器预热] ✅ 预热完成,浏览器已就绪!", file=sys.stderr)
print(f"[浏览器预热] 当前页面: {self.page.url}", file=sys.stderr)
except Exception as e:
print(f"[浏览器预热] ⚠️ 预热失败: {str(e)}", file=sys.stderr)
print("[浏览器预热] 将在首次使用时再初始化", file=sys.stderr)
self.is_preheated = False
async def repreheat(self, target_url: str = "https://creator.xiaohongshu.com/login"):
"""
补充预热:在后台重新将浏览器预热到目标页面
用于在主浏览器被使用后,重新预热以保证下次使用的性能
重要:如果浏览器正在使用中(有临时实例),跳过预热避免干扰
Args:
target_url: 预热目标页面,默认为小红书登录页
"""
# 关键优化:检查是否有临时浏览器正在使用
if len(self.temp_browsers) > 0:
print(f"[浏览器补充预热] 检测到 {len(self.temp_browsers)} 个临时浏览器正在使用,跳过预热避免干扰", file=sys.stderr)
return
# 检查主浏览器是否正在被使用(通过最近使用时间判断)
time_since_last_use = time.time() - self.last_used_time
if time_since_last_use < 10: # 最近10秒内使用过可能还在操作中
print(f"[浏览器补充预热] 主浏览器最近 {time_since_last_use:.1f}秒前被使用,可能还在操作中,跳过预热", file=sys.stderr)
return
max_retries = 3
retry_count = 0
while retry_count < max_retries:
try:
# 检查主浏览器是否存活
if not await self._is_browser_alive():
print(f"[浏览器补充预热] 浏览器未初始化,执行完整预热 (尝试 {retry_count + 1}/{max_retries})", file=sys.stderr)
await self.preheat(target_url)
self.is_preheated = True
return
# 检查是否已经在目标页面
current_url = self.page.url if self.page else ""
if target_url in current_url:
print(f"[浏览器补充预热] 已在目标页面,无需补充预热: {current_url}", file=sys.stderr)
self.is_preheated = True
return
print(f"[浏览器补充预热] 开始补充预热... (尝试 {retry_count + 1}/{max_retries})", file=sys.stderr)
print(f"[浏览器补充预热] 当前页面: {current_url}", file=sys.stderr)
# 再次检查是否有新的临时浏览器(双重检查)
if len(self.temp_browsers) > 0:
print(f"[浏览器补充预热] 检测到新的临时浏览器启动,取消预热", file=sys.stderr)
return
# 访问目标页面
print(f"[浏览器补充预热] 正在访问: {target_url}", file=sys.stderr)
await self.page.goto(target_url, wait_until='domcontentloaded', timeout=45000)
# 额外等待,确保页面完全加载
await asyncio.sleep(2)
# 验证页面是否正确加载
current_page_url = self.page.url
if target_url in current_page_url or 'creator.xiaohongshu.com' in current_page_url:
self.is_preheated = True
self.last_used_time = time.time()
print("[浏览器补充预热] ✅ 补充预热完成!", file=sys.stderr)
print(f"[浏览器补充预热] 当前页面: {current_page_url}", file=sys.stderr)
return # 成功,退出重试循环
else:
print(f"[浏览器补充预热] 页面未正确加载,期望: {target_url}, 实际: {current_page_url}", file=sys.stderr)
raise Exception(f"页面未正确加载到目标地址")
except Exception as e:
retry_count += 1
print(f"[浏览器补充预热] ⚠️ 补充预热失败 (尝试 {retry_count}/{max_retries}): {str(e)}", file=sys.stderr)
if retry_count < max_retries:
# 等待一段时间后重试
await asyncio.sleep(2)
# 尝试重新初始化浏览器
try:
await self.close() # 关闭当前可能有问题的浏览器
except:
pass # 忽略关闭时的错误
else:
# 所有重试都失败了
print(f"[浏览器补充预热] ❌ 所有重试都失败了,将尝试完整预热", file=sys.stderr)
try:
await self.close() # 先关闭当前浏览器
except:
pass
# 执行完整预热
try:
await self.preheat(target_url)
self.is_preheated = True
return
except Exception as final_error:
print(f"[浏览器补充预热] ❌ 最终预热也失败: {str(final_error)}", file=sys.stderr)
self.is_preheated = False
# 即使最终失败,也要确保浏览器处于可用状态
try:
await self._init_browser()
except:
pass
async def _create_temp_browser(self, cookies: Optional[list] = None, proxy: Optional[str] = None,
user_agent: Optional[str] = None, headless: bool = True) -> tuple[Browser, BrowserContext, Page]:
"""创建临时浏览器实例(用于并发请求)
Args:
cookies: Cookie列表
proxy: 代理地址
user_agent: 自定义User-Agent
headless: 是否使用无头模式
"""
try:
# 启动Playwright复用全局实例
if not self.playwright:
if sys.platform == 'win32':
try:
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
except Exception as e:
print(f"[临时浏览器] 警告: 设置事件循环策略失败: {str(e)}", file=sys.stderr)
self.playwright = await async_playwright().start()
# 启动浏览器(临时实例,性能优先配置)
launch_kwargs = {
"headless": headless, # 使用传入的headless参数
"args": [
'--disable-blink-features=AutomationControlled',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
# 性能优化
'--disable-web-security',
'--disable-features=IsolateOrigins,site-per-process',
'--disable-site-isolation-trials',
'--enable-features=NetworkService,NetworkServiceInProcess',
'--disable-background-timer-throttling',
'--disable-backgrounding-occluded-windows',
'--disable-renderer-backgrounding',
'--disable-background-networking',
# 缓存优化
'--disk-cache-size=268435456',
'--media-cache-size=134217728',
# 渲染优化
'--enable-gpu-rasterization',
'--enable-zero-copy',
'--ignore-gpu-blocklist',
'--enable-accelerated-2d-canvas',
# 网络优化
'--enable-quic',
'--enable-tcp-fast-open',
'--max-connections-per-host=10',
# 减少不必要的功能
'--disable-extensions',
'--disable-breakpad',
'--disable-component-extensions-with-background-pages',
'--disable-ipc-flooding-protection',
'--disable-hang-monitor',
'--disable-prompt-on-repost',
'--disable-domain-reliability',
'--disable-component-update',
# 界面优化
'--hide-scrollbars',
'--mute-audio',
'--no-first-run',
'--no-default-browser-check',
'--metrics-recording-only',
'--force-color-profile=srgb',
],
}
if proxy:
launch_kwargs["proxy"] = {"server": proxy}
browser = await self.playwright.chromium.launch(**launch_kwargs)
# 创建上下文
context_kwargs = {
"viewport": {'width': 1280, 'height': 720},
"user_agent": user_agent or 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
}
context = await browser.new_context(**context_kwargs)
# 注入Cookie
if cookies:
await context.add_cookies(cookies)
# 创建页面
page = await context.new_page()
return browser, context, page
except Exception as e:
print(f"[临时浏览器] 创建失败: {str(e)}", file=sys.stderr)
raise
async def release_temp_browser(self, session_id: str):
"""释放临时浏览器"""
async with self.temp_lock:
if session_id in self.temp_browsers:
browser_info = self.temp_browsers[session_id]
try:
await browser_info["page"].close()
await browser_info["context"].close()
await browser_info["browser"].close()
print(f"[浏览器池] 已释放会话 {session_id} 的临时浏览器", file=sys.stderr)
except Exception as e:
print(f"[浏览器池] 释放临时浏览器异常: {str(e)}", file=sys.stderr)
finally:
del self.temp_browsers[session_id]
def get_stats(self) -> Dict[str, Any]:
"""获取浏览器池统计信息"""
return {
"browser_alive": self.browser is not None,
"context_alive": self.context is not None,
"page_alive": self.page is not None,
"is_preheated": self.is_preheated,
"temp_browsers_count": len(self.temp_browsers),
"max_instances": self.max_instances,
"last_used_time": self.last_used_time,
"idle_seconds": int(time.time() - self.last_used_time) if self.last_used_time > 0 else 0,
"idle_timeout": self.idle_timeout
}
# 全局单例
_browser_pool: Optional[BrowserPool] = None
def get_browser_pool(idle_timeout: int = 1800, headless: bool = True) -> BrowserPool:
"""获取全局浏览器池实例(单例)
Args:
idle_timeout: 空闲超时时间(秒)
headless: 是否使用无头模式False为有头模式方便调试
"""
global _browser_pool
if _browser_pool is None:
print(f"[浏览器池] 创建单例,模式: {'headless' if headless else 'headed'}", file=sys.stderr)
_browser_pool = BrowserPool(idle_timeout=idle_timeout, headless=headless)
elif _browser_pool.headless != headless:
# 如果headless配置变了需要更新
print(f"[浏览器池] 检测到headless配置变更: {_browser_pool.headless} -> {headless}", file=sys.stderr)
_browser_pool.headless = headless
return _browser_pool

66
backend/config.dev.yaml Normal file
View File

@@ -0,0 +1,66 @@
# 小红书Python服务配置 - 开发环境
# ========== 服务配置 ==========
server:
host: "0.0.0.0"
port: 8000
debug: true
reload: false # Windows环境不建议启用热重载
# ========== 数据库配置 ==========
database:
host: localhost
port: 3306
username: root
password: JKjk20011115
dbname: ai_wht
charset: utf8mb4
max_connections: 10
min_connections: 2
# ========== 浏览器池配置 ==========
browser_pool:
idle_timeout: 1800 # 空闲超时(秒),已禁用自动清理,保持常驻
max_instances: 5 # 最大浏览器实例数
preheat_enabled: true # 是否启用预热
preheat_url: "https://creator.xiaohongshu.com/login" # 预热URL根据login.page自动调整
# ========== 登录/绑定功能配置 ==========
login:
headless: false # 登录/绑定时的浏览器模式: false=有头模式(方便用户操作)true=无头模式
page: "home" # 登录页面类型: creator=创作者中心(creator.xiaohongshu.com/login), home=小红书首页(www.xiaohongshu.com)
# ========== 定时发布调度器配置 ==========
scheduler:
enabled: true # 是否启用定时任务
cron: "*/5 * * * * *" # Cron表达式(秒 分 时 日 月 周) - 每5秒执行一次(开发环境测试)
max_concurrent: 2 # 最大并发发布数
publish_timeout: 300 # 发布超时时间(秒)
max_articles_per_user_per_run: 2 # 每轮每个用户最大发文数
max_failures_per_user_per_run: 3 # 每轮每个用户最大失败次数(达到后暂停本轮后续发布)
max_daily_articles_per_user: 6 # 每个用户每日最大发文数(自动发布)
max_hourly_articles_per_user: 2 # 每个用户每小时最大发文数(自动发布)
headless: false # 浏览器模式: false=有头模式(可调试)true=无头模式(生产环境)
# ========== 防封策略配置 ==========
enable_random_ua: true # 启用随机User-Agent防指纹识别
min_publish_interval: 30 # 最小发布间隔(秒),模拟真人行为
max_publish_interval: 120 # 最大发布间隔(秒),模拟真人行为
# ========== 代理池配置 ==========
proxy_pool:
enabled: false # 默认关闭,按需开启
api_url: "http://api.tianqiip.com/getip?secret=lu29e593&num=1&type=txt&port=1&mr=1&sign=4b81a62eaed89ba802a8f34053e2c964"
# ========== 阿里云短信配置 ==========
ali_sms:
access_key_id: "LTAI5tSMvnCJdqkZtCVWgh8R" # 从环境变量或配置文件读取
access_key_secret: "nyFzXyIi47peVLK4wR2qqbPezmU79W" # 从环境变量或配置文件读取
sign_name: "北京乐航时代科技" # 短信签名
template_code: "SMS_486210104" # 短信模板CODE
code_expire_minutes: 5 # 验证码有效期(分钟)
# ========== 日志配置 ==========
logging:
level: DEBUG
format: "[%(asctime)s] [%(levelname)s] %(message)s"

66
backend/config.prod.yaml Normal file
View File

@@ -0,0 +1,66 @@
# 小红书Python服务配置 - 生产环境
# ========== 服务配置 ==========
server:
host: "0.0.0.0"
port: 8020
debug: false
reload: false
# ========== 数据库配置 ==========
database:
host: 8.149.233.36
port: 3306
username: ai_wht_write
password: 7aK_H2yvokVumr84lLNDt8fDBp6P
dbname: ai_wht
charset: utf8mb4
max_connections: 20
min_connections: 5
# ========== 浏览器池配置 ==========
browser_pool:
idle_timeout: 1800 # 空闲超时(秒),已禁用自动清理,保持常驻
max_instances: 10 # 最大浏览器实例数(生产环境可以更多)
preheat_enabled: true # 是否启用预热
preheat_url: "https://creator.xiaohongshu.com/login" # 预热URL根据login.page自动调整
# ========== 登录/绑定功能配置 ==========
login:
headless: true # 登录/绑定时的浏览器模式: false=有头模式(方便用户操作)true=无头模式
page: "home" # 登录页面类型: creator=创作者中心(creator.xiaohongshu.com/login), home=小红书首页(www.xiaohongshu.com)
# ========== 定时发布调度器配置 ==========
scheduler:
enabled: true # 是否启用定时任务
cron: "0 */5 * * * *" # Cron表达式(秒 分 时 日 月 周) - 每5分钟执行一次
max_concurrent: 5 # 最大并发发布数
publish_timeout: 300 # 发布超时时间(秒)
max_articles_per_user_per_run: 5 # 每轮每个用户最大发文数
max_failures_per_user_per_run: 3 # 每轮每个用户最大失败次数(达到后暂停本轮后续发布)
max_daily_articles_per_user: 20 # 每个用户每日最大发文数(自动发布)
max_hourly_articles_per_user: 3 # 每个用户每小时最大发文数(自动发布)
headless: true # 浏览器模式: false=有头模式(可调试)true=无头模式(生产环境)
# ========== 防封策略配置 ==========
enable_random_ua: true # 启用随机User-Agent防指纹识别
min_publish_interval: 60 # 最小发布间隔生产环境建议60-300秒
max_publish_interval: 300 # 最大发布间隔生产环境建议60-300秒
# ========== 代理池配置 ==========
proxy_pool:
enabled: false # 默认关闭,按需开启
api_url: "http://api.tianqiip.com/getip?secret=lu29e593&num=1&type=txt&port=1&mr=1&sign=4b81a62eaed89ba802a8f34053e2c964"
# ========== 阿里云短信配置 ==========
ali_sms:
access_key_id: "LTAI5tSMvnCJdqkZtCVWgh8R" # 生产环境建议使用环境变量
access_key_secret: "nyFzXyIi47peVLK4wR2qqbPezmU79W" # 生产环境建议使用环境变量
sign_name: "北京乐航时代科技" # 短信签名
template_code: "SMS_486210104" # 短信模板CODE
code_expire_minutes: 5 # 验证码有效期(分钟)
# ========== 日志配置 ==========
logging:
level: INFO
format: "[%(asctime)s] [%(levelname)s] %(message)s"

146
backend/config.py Normal file
View File

@@ -0,0 +1,146 @@
"""
配置管理模块
支持从YAML文件加载配置支持环境变量覆盖
"""
import os
import yaml
from typing import Dict, Any
class Config:
"""配置类"""
def __init__(self, config_dict: Dict[str, Any]):
self._config = config_dict
def get(self, key: str, default=None):
"""获取配置值,支持点号分隔的嵌套键"""
keys = key.split('.')
value = self._config
for k in keys:
if isinstance(value, dict):
value = value.get(k)
if value is None:
return default
else:
return default
return value
def get_dict(self, key: str) -> Dict[str, Any]:
"""获取配置字典"""
value = self.get(key)
return value if isinstance(value, dict) else {}
def get_int(self, key: str, default: int = 0) -> int:
"""获取整数配置"""
value = self.get(key, default)
try:
return int(value)
except (ValueError, TypeError):
return default
def get_bool(self, key: str, default: bool = False) -> bool:
"""获取布尔配置"""
value = self.get(key, default)
if isinstance(value, bool):
return value
if isinstance(value, str):
return value.lower() in ('true', 'yes', '1', 'on')
return bool(value)
def get_str(self, key: str, default: str = '') -> str:
"""获取字符串配置"""
value = self.get(key, default)
return str(value) if value is not None else default
def load_config(env: str = None) -> Config:
"""
加载配置文件
Args:
env: 环境名称,可选值: dev, prod
如果不指定,从环境变量 ENV 读取,默认为 dev
Returns:
Config对象
"""
# 确定环境
if env is None:
env = os.getenv('ENV', 'dev')
# 配置文件路径
config_file = f'config.{env}.yaml'
config_path = os.path.join(os.path.dirname(__file__), config_file)
if not os.path.exists(config_path):
raise FileNotFoundError(f"配置文件不存在: {config_path}")
# 加载YAML配置
with open(config_path, 'r', encoding='utf-8') as f:
config_dict = yaml.safe_load(f)
# 环境变量覆盖(支持常用配置)
# 数据库配置
if os.getenv('DB_HOST'):
config_dict.setdefault('database', {})['host'] = os.getenv('DB_HOST')
if os.getenv('DB_PORT'):
config_dict.setdefault('database', {})['port'] = int(os.getenv('DB_PORT'))
if os.getenv('DB_USER'):
config_dict.setdefault('database', {})['username'] = os.getenv('DB_USER')
if os.getenv('DB_PASSWORD'):
config_dict.setdefault('database', {})['password'] = os.getenv('DB_PASSWORD')
if os.getenv('DB_NAME'):
config_dict.setdefault('database', {})['dbname'] = os.getenv('DB_NAME')
# 调度器配置
if os.getenv('SCHEDULER_ENABLED'):
config_dict.setdefault('scheduler', {})['enabled'] = os.getenv('SCHEDULER_ENABLED').lower() == 'true'
if os.getenv('SCHEDULER_CRON'):
config_dict.setdefault('scheduler', {})['cron'] = os.getenv('SCHEDULER_CRON')
if os.getenv('SCHEDULER_MAX_CONCURRENT'):
config_dict.setdefault('scheduler', {})['max_concurrent'] = int(os.getenv('SCHEDULER_MAX_CONCURRENT'))
if os.getenv('SCHEDULER_PUBLISH_TIMEOUT'):
config_dict.setdefault('scheduler', {})['publish_timeout'] = int(os.getenv('SCHEDULER_PUBLISH_TIMEOUT'))
if os.getenv('SCHEDULER_MAX_ARTICLES_PER_USER_PER_RUN'):
config_dict.setdefault('scheduler', {})['max_articles_per_user_per_run'] = int(os.getenv('SCHEDULER_MAX_ARTICLES_PER_USER_PER_RUN'))
if os.getenv('SCHEDULER_MAX_FAILURES_PER_USER_PER_RUN'):
config_dict.setdefault('scheduler', {})['max_failures_per_user_per_run'] = int(os.getenv('SCHEDULER_MAX_FAILURES_PER_USER_PER_RUN'))
if os.getenv('SCHEDULER_MAX_DAILY_ARTICLES_PER_USER'):
config_dict.setdefault('scheduler', {})['max_daily_articles_per_user'] = int(os.getenv('SCHEDULER_MAX_DAILY_ARTICLES_PER_USER'))
if os.getenv('SCHEDULER_MAX_HOURLY_ARTICLES_PER_USER'):
config_dict.setdefault('scheduler', {})['max_hourly_articles_per_user'] = int(os.getenv('SCHEDULER_MAX_HOURLY_ARTICLES_PER_USER'))
# 代理池配置
if os.getenv('PROXY_POOL_ENABLED'):
config_dict.setdefault('proxy_pool', {})['enabled'] = os.getenv('PROXY_POOL_ENABLED').lower() == 'true'
if os.getenv('PROXY_POOL_API_URL'):
config_dict.setdefault('proxy_pool', {})['api_url'] = os.getenv('PROXY_POOL_API_URL')
print(f"[配置] 已加载配置文件: {config_file}")
print(f"[配置] 环境: {env}")
print(f"[配置] 数据库: {config_dict.get('database', {}).get('host')}:{config_dict.get('database', {}).get('port')}")
print(f"[配置] 调度器: {'启用' if config_dict.get('scheduler', {}).get('enabled') else '禁用'}")
return Config(config_dict)
# 全局配置对象
app_config: Config = None
def init_config(env: str = None):
"""初始化全局配置"""
global app_config
app_config = load_config(env)
return app_config
def get_config() -> Config:
"""获取全局配置对象"""
global app_config
if app_config is None:
app_config = load_config()
return app_config

View File

@@ -0,0 +1,98 @@
"""
大麦固定代理IP配置
用于在无头浏览器中使用固定代理IP
"""
# 大麦固定代理IP池
DAMAI_PROXY_POOL = [
{
"name": "大麦代理1",
"server": "http://36.137.177.131:50001",
"username": "qqwvy0",
"password": "mun3r7xz",
"enabled": True
},
{
"name": "大麦代理2",
"server": "http://111.132.40.72:50002",
"username": "ih3z07",
"password": "078bt7o5",
"enabled": True
}
]
def get_proxy_config(index: int = 0) -> dict:
"""
获取指定索引的代理配置
Args:
index: 代理索引0或1
Returns:
代理配置字典包含server、username、password
"""
if index < 0 or index >= len(DAMAI_PROXY_POOL):
raise ValueError(f"代理索引无效: {index},有效范围: 0-{len(DAMAI_PROXY_POOL)-1}")
proxy = DAMAI_PROXY_POOL[index]
if not proxy.get("enabled", True):
raise ValueError(f"代理已禁用: {proxy['name']}")
return {
"server": proxy["server"],
"username": proxy["username"],
"password": proxy["password"]
}
def get_all_enabled_proxies() -> list:
"""
获取所有已启用的代理配置
Returns:
代理配置列表
"""
return [
{
"server": p["server"],
"username": p["username"],
"password": p["password"],
"name": p["name"]
}
for p in DAMAI_PROXY_POOL
if p.get("enabled", True)
]
def get_random_proxy() -> dict:
"""
随机获取一个可用的代理配置
Returns:
代理配置字典
"""
import random
enabled_proxies = [p for p in DAMAI_PROXY_POOL if p.get("enabled", True)]
if not enabled_proxies:
raise ValueError("没有可用的代理")
proxy = random.choice(enabled_proxies)
return {
"server": proxy["server"],
"username": proxy["username"],
"password": proxy["password"],
"name": proxy["name"]
}
# 快捷访问
def get_proxy_1():
"""获取代理1配置"""
return get_proxy_config(0)
def get_proxy_2():
"""获取代理2配置"""
return get_proxy_config(1)

202
backend/debug_login_page.py Normal file
View File

@@ -0,0 +1,202 @@
"""
小红书登录页面调试脚本
用于调试登录页面结构和元素选择器
"""
import asyncio
import sys
from xhs_login import XHSLoginService
async def debug_login_page(proxy_index: int = 0):
"""
调试登录页面,查看页面结构和可用元素
"""
print(f"\n{'='*60}")
print(f"🔍 调试小红书登录页面")
print(f"{'='*60}")
# 从代理配置获取代理信息
from damai_proxy_config import get_proxy_config
proxy_config = get_proxy_config(proxy_index)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f"✅ 使用代理: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
# 创建登录服务
login_service = XHSLoginService(use_pool=False) # 不使用池,便于调试
try:
# 初始化浏览器(使用代理)
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
await login_service.init_browser(proxy=proxy_url, user_agent=user_agent)
print("✅ 浏览器初始化成功(已启用代理)")
# 访问登录页面
print(f"\n🌐 访问小红书创作者平台登录页...")
await login_service.page.goto('https://creator.xiaohongshu.com/login', wait_until='networkidle', timeout=30000)
await asyncio.sleep(5) # 等待更长时间让页面完全加载
# 获取页面标题和URL
title = await login_service.page.title()
url = login_service.page.url
print(f"✅ 页面加载完成")
print(f" 标题: {title}")
print(f" URL: {url}")
# 获取页面内容
content = await login_service.page.content()
print(f" 页面内容长度: {len(content)} 字符")
# 查找所有input元素
print(f"\n🔍 查找所有input元素...")
inputs = await login_service.page.query_selector_all('input')
print(f" 找到 {len(inputs)} 个input元素")
for i, inp in enumerate(inputs):
try:
placeholder = await inp.get_attribute('placeholder')
input_type = await inp.get_attribute('type')
name = await inp.get_attribute('name')
class_name = await inp.get_attribute('class')
id_attr = await inp.get_attribute('id')
print(f" Input {i+1}:")
print(f" - placeholder: {placeholder}")
print(f" - type: {input_type}")
print(f" - name: {name}")
print(f" - id: {id_attr}")
print(f" - class: {class_name}")
except Exception as e:
print(f" Input {i+1}: 获取属性失败 - {str(e)}")
# 查找所有可能的手机号输入框选择器
print(f"\n🔍 尝试常见手机号输入框选择器...")
phone_selectors = [
'input[placeholder="手机号"]',
'input[placeholder*="手机"]',
'input[type="tel"]',
'input[type="text"][placeholder*=""]',
'input[placeholder*="Phone"]',
'input[name*="phone"]',
'input[placeholder*="号码"]',
'input[placeholder*="mobile"]',
'input[placeholder*="Mobile"]'
]
found_inputs = []
for selector in phone_selectors:
try:
element = await login_service.page.query_selector(selector)
if element:
found_inputs.append((selector, element))
placeholder = await element.get_attribute('placeholder')
print(f" ✅ 找到: {selector} (placeholder: {placeholder})")
except Exception as e:
print(f" ❌ 选择器 {selector} 失败: {str(e)}")
if not found_inputs:
print(" ❌ 未找到任何手机号相关输入框")
# 查找所有按钮元素
print(f"\n🔍 查找所有button元素...")
buttons = await login_service.page.query_selector_all('button')
print(f" 找到 {len(buttons)} 个button元素")
for i, btn in enumerate(buttons[:10]): # 只显示前10个
try:
text = await btn.inner_text()
class_name = await btn.get_attribute('class')
id_attr = await btn.get_attribute('id')
print(f" Button {i+1}:")
print(f" - text: '{text.strip()}'")
print(f" - class: {class_name}")
print(f" - id: {id_attr}")
except Exception as e:
print(f" Button {i+1}: 获取信息失败 - {str(e)}")
# 查找发送验证码按钮
print(f"\n🔍 尝试常见发送验证码按钮选择器...")
code_selectors = [
'text="发送验证码"',
'text="获取验证码"',
'text="发送"',
'text="获取"',
'button:has-text("验证码")',
'button:has-text("发送")',
'button:has-text("获取")',
'[class*="send"]',
'[class*="code"]',
'[class*="verify"]'
]
found_buttons = []
for selector in code_selectors:
try:
element = await login_service.page.query_selector(selector)
if element:
found_buttons.append((selector, element))
text = await element.inner_text()
print(f" ✅ 找到: {selector} (text: '{text.strip()}')")
except Exception as e:
print(f" ❌ 选择器 {selector} 失败: {str(e)}")
if not found_buttons:
print(" ❌ 未找到任何验证码相关按钮")
# 打印页面HTML片段用于分析结构
print(f"\n📄 页面HTML片段前1000字符...")
print(content[:1000])
print(f"\n📄 页面HTML片段1000-2000字符...")
print(content[1000:2000])
# 等待用户交互(保持浏览器打开)
print(f"\n⏸️ 浏览器保持打开状态,您可以手动检查页面")
print(f" URL: {url}")
print(f" 按 Ctrl+C 关闭浏览器...")
try:
while True:
await asyncio.sleep(1)
except KeyboardInterrupt:
print(f"\n⏹️ 用户中断,关闭浏览器...")
except Exception as e:
print(f"❌ 调试过程异常: {str(e)}")
import traceback
traceback.print_exc()
finally:
await login_service.close_browser()
async def main():
"""主函数"""
print("="*60)
print("🔍 小红书登录页面调试工具")
print("="*60)
print("\n此工具将帮助您分析小红书登录页面的结构")
print("以便正确识别手机号输入框和验证码按钮")
proxy_choice = input("\n请选择代理 (0 或 1, 默认为0): ").strip()
if proxy_choice not in ['0', '1']:
proxy_choice = '0'
proxy_idx = int(proxy_choice)
await debug_login_page(proxy_idx)
print(f"\n{'='*60}")
print("✅ 调试完成!")
print("="*60)
if __name__ == "__main__":
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
# 运行调试
asyncio.run(main())

Binary file not shown.

After

Width:  |  Height:  |  Size: 352 KiB

146
backend/error_screenshot.py Normal file
View File

@@ -0,0 +1,146 @@
"""
错误截图保存工具
当发生错误时自动截图并保存,便于问题排查
"""
import os
import sys
from datetime import datetime
from pathlib import Path
from typing import Optional
from playwright.async_api import Page
# 截图保存目录
SCREENSHOT_DIR = Path("error_screenshots")
SCREENSHOT_DIR.mkdir(exist_ok=True)
async def save_error_screenshot(
page: Optional[Page],
error_type: str,
error_message: str = "",
prefix: str = ""
) -> Optional[str]:
"""
保存错误截图
Args:
page: Playwright 页面对象
error_type: 错误类型login_failed, send_code_failed, publish_failed等
error_message: 错误信息(可选,会添加到日志)
prefix: 文件名前缀(可选)
Returns:
截图文件路径失败返回None
"""
if not page:
print("[错误截图] 页面对象为空,无法截图", file=sys.stderr)
return None
try:
# 生成文件名年月日时分秒_错误类型.png
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
# 清理错误类型字符串(移除特殊字符)
safe_error_type = "".join(c for c in error_type if c.isalnum() or c in ('_', '-'))
# 组合文件名
if prefix:
filename = f"{prefix}_{timestamp}_{safe_error_type}.png"
else:
filename = f"{timestamp}_{safe_error_type}.png"
filepath = SCREENSHOT_DIR / filename
# 截图
await page.screenshot(path=str(filepath), full_page=True)
# 打印日志
print(f"[错误截图] 已保存: {filepath}", file=sys.stderr)
if error_message:
print(f"[错误截图] 错误信息: {error_message}", file=sys.stderr)
# 返回文件路径
return str(filepath)
except Exception as e:
print(f"[错误截图] 截图失败: {str(e)}", file=sys.stderr)
return None
def cleanup_old_screenshots(days: int = 7):
"""
清理旧的错误截图
Args:
days: 保留最近几天的截图默认7天
"""
try:
import time
current_time = time.time()
cutoff_time = current_time - (days * 24 * 60 * 60)
deleted_count = 0
for file in SCREENSHOT_DIR.glob("*.png"):
if file.stat().st_mtime < cutoff_time:
file.unlink()
deleted_count += 1
if deleted_count > 0:
print(f"[错误截图] 已清理 {deleted_count} 个超过 {days} 天的旧截图", file=sys.stderr)
except Exception as e:
print(f"[错误截图] 清理旧截图失败: {str(e)}", file=sys.stderr)
async def save_screenshot_with_html(
page: Optional[Page],
error_type: str,
error_message: str = "",
prefix: str = ""
) -> tuple[Optional[str], Optional[str]]:
"""
保存错误截图和HTML源码用于深度调试
Args:
page: Playwright 页面对象
error_type: 错误类型
error_message: 错误信息(可选)
prefix: 文件名前缀(可选)
Returns:
(截图路径, HTML路径),失败返回(None, None)
"""
if not page:
return None, None
try:
# 生成文件名
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
safe_error_type = "".join(c for c in error_type if c.isalnum() or c in ('_', '-'))
if prefix:
base_filename = f"{prefix}_{timestamp}_{safe_error_type}"
else:
base_filename = f"{timestamp}_{safe_error_type}"
# 保存截图
screenshot_path = SCREENSHOT_DIR / f"{base_filename}.png"
await page.screenshot(path=str(screenshot_path), full_page=True)
# 保存HTML
html_path = SCREENSHOT_DIR / f"{base_filename}.html"
html_content = await page.content()
with open(html_path, 'w', encoding='utf-8') as f:
f.write(html_content)
print(f"[错误截图] 已保存截图: {screenshot_path}", file=sys.stderr)
print(f"[错误截图] 已保存HTML: {html_path}", file=sys.stderr)
if error_message:
print(f"[错误截图] 错误信息: {error_message}", file=sys.stderr)
return str(screenshot_path), str(html_path)
except Exception as e:
print(f"[错误截图] 保存截图和HTML失败: {str(e)}", file=sys.stderr)
return None, None

Binary file not shown.

After

Width:  |  Height:  |  Size: 166 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 245 KiB

View File

@@ -0,0 +1,200 @@
"""
大麦固定代理使用示例
演示如何在实际项目中使用固定代理IP
"""
import asyncio
import sys
from browser_pool import get_browser_pool
from damai_proxy_config import get_proxy_1, get_proxy_2, get_random_proxy
async def example1_use_specific_proxy():
"""示例1: 使用指定的代理IP"""
print("\n" + "="*60)
print("示例1: 使用指定的代理IP代理1")
print("="*60)
# 获取代理1的配置
proxy_config = get_proxy_1()
print(f"📌 使用代理: {proxy_config['server']}")
# 获取浏览器池
pool = get_browser_pool()
try:
# 获取浏览器实例(带代理)
# 注意需要修改browser_pool以支持带认证的代理
browser, context, page = await pool.get_browser(
proxy=proxy_config["server"]
)
# 访问测试页面
print("🌐 访问IP检测页面...")
await page.goto("http://httpbin.org/ip", timeout=30000)
# 获取IP信息
ip_info = await page.evaluate("() => document.body.innerText")
print(f"✅ 当前IP:\n{ip_info}")
except Exception as e:
print(f"❌ 错误: {str(e)}")
async def example2_use_random_proxy():
"""示例2: 随机使用一个代理IP"""
print("\n" + "="*60)
print("示例2: 随机使用一个代理IP")
print("="*60)
# 随机获取一个代理
proxy_config = get_random_proxy()
print(f"📌 随机选择代理: {proxy_config['name']}")
print(f" 服务器: {proxy_config['server']}")
# 后续操作类似示例1
print("✅ 代理配置已获取,可以用于浏览器实例化")
async def example3_use_with_playwright_directly():
"""示例3: 直接在Playwright中使用代理带认证"""
print("\n" + "="*60)
print("示例3: 直接在Playwright中使用代理完整认证")
print("="*60)
from playwright.async_api import async_playwright
# 获取代理配置
proxy_config = get_proxy_2()
print(f"📌 使用代理2: {proxy_config['server']}")
playwright = None
browser = None
try:
# 启动Playwright
playwright = await async_playwright().start()
# 配置代理(完整配置,包含认证信息)
proxy_settings = {
"server": proxy_config["server"],
"username": proxy_config["username"],
"password": proxy_config["password"]
}
# 启动浏览器
browser = await playwright.chromium.launch(
headless=True,
proxy=proxy_settings,
args=['--disable-blink-features=AutomationControlled']
)
# 创建上下文和页面
context = await browser.new_context()
page = await context.new_page()
# 访问测试页面
print("🌐 访问大麦网...")
await page.goto("https://www.damai.cn/", timeout=30000)
title = await page.title()
print(f"✅ 页面标题: {title}")
print(f" 当前URL: {page.url}")
except Exception as e:
print(f"❌ 错误: {str(e)}")
finally:
if browser:
await browser.close()
if playwright:
await playwright.stop()
async def example4_switch_proxy_on_error():
"""示例4: 代理失败时自动切换"""
print("\n" + "="*60)
print("示例4: 代理失败时自动切换到另一个代理")
print("="*60)
from damai_proxy_config import get_all_enabled_proxies
from playwright.async_api import async_playwright
proxies = get_all_enabled_proxies()
print(f"📊 可用代理数: {len(proxies)}")
for i, proxy_config in enumerate(proxies):
print(f"\n🔄 尝试代理 {i+1}/{len(proxies)}: {proxy_config['name']}")
playwright = None
browser = None
try:
# 启动Playwright
playwright = await async_playwright().start()
# 配置代理
proxy_settings = {
"server": proxy_config["server"],
"username": proxy_config["username"],
"password": proxy_config["password"]
}
# 启动浏览器
browser = await playwright.chromium.launch(
headless=True,
proxy=proxy_settings
)
context = await browser.new_context()
page = await context.new_page()
# 测试访问
await page.goto("http://httpbin.org/ip", timeout=15000)
ip_info = await page.evaluate("() => document.body.innerText")
print(f"{proxy_config['name']} 可用")
print(f" IP信息: {ip_info.strip()}")
# 成功则退出循环
await browser.close()
await playwright.stop()
break
except Exception as e:
print(f"⚠️ {proxy_config['name']} 不可用: {str(e)}")
if browser:
await browser.close()
if playwright:
await playwright.stop()
# 如果是最后一个代理也失败,则报错
if i == len(proxies) - 1:
print("❌ 所有代理都不可用!")
async def main():
"""运行所有示例"""
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
print("\n" + "🎯"*30)
print("大麦固定代理IP使用示例集")
print("🎯"*30)
# 示例2: 随机代理
await example2_use_random_proxy()
# 示例3: 完整的Playwright代理使用
await example3_use_with_playwright_directly()
# 示例4: 代理容错切换
await example4_switch_proxy_on_error()
print("\n" + "="*60)
print("🎉 所有示例运行完成!")
print("="*60)
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,14 +1,32 @@
# Windows兼容性必须在任何异步操作之前设置事件循环策略
import sys
import asyncio
import aiohttp
import json
if sys.platform == 'win32':
# Windows下使用ProactorEventLoopPolicy来支持Playwright的子进程
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
print("[系统] Windows环境已设置ProactorEventLoopPolicy", file=sys.stderr)
# 加载配置
from config import init_config, get_config
from dotenv import load_dotenv
load_dotenv() # 从 .env 文件加载环境变量(可选,用于覆盖配置文件)
from fastapi import FastAPI, HTTPException, File, UploadFile, Form
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from typing import Optional, Dict, Any, List
import asyncio
from datetime import datetime
import os
import shutil
from pathlib import Path
from xhs_login import XHSLoginService
from browser_pool import get_browser_pool
from scheduler import XHSScheduler
from error_screenshot import cleanup_old_screenshots
from ali_sms_service import AliSmsService
app = FastAPI(title="小红书登录API")
@@ -21,8 +39,54 @@ app.add_middleware(
allow_headers=["*"],
)
# 全局登录服务实例
login_service = XHSLoginService()
# 全局登录服务实例延迟初始化避免在startup前创建浏览器池
login_service = None
# 全局浏览器池实例在startup时初始化
browser_pool = None
# 全局调度器实例
scheduler = None
# 全局阿里云短信服务实例
sms_service = None
async def fetch_proxy_from_pool() -> Optional[str]:
"""从代理池接口获取一个代理地址http://ip:port获取失败返回None"""
config = get_config()
if not config.get_bool('proxy_pool.enabled', False):
return None
api_url = config.get_str('proxy_pool.api_url', '')
if not api_url:
return None
try:
timeout = aiohttp.ClientTimeout(total=10)
async with aiohttp.ClientSession(timeout=timeout) as session:
async with session.get(api_url) as resp:
if resp.status != 200:
print(f"[代理池] 接口返回非200状态码: {resp.status}", file=sys.stderr)
return None
text = (await resp.text()).strip()
if not text:
print("[代理池] 返回内容为空", file=sys.stderr)
return None
line = text.splitlines()[0].strip()
if not line:
print("[代理池] 首行内容为空", file=sys.stderr)
return None
if line.startswith("http://") or line.startswith("https://"):
return line
return "http://" + line
except Exception as e:
print(f"[代理池] 请求失败: {str(e)}", file=sys.stderr)
return None
# 临时文件存储目录
TEMP_DIR = Path("temp_uploads")
@@ -32,11 +96,19 @@ TEMP_DIR.mkdir(exist_ok=True)
class SendCodeRequest(BaseModel):
phone: str
country_code: str = "+86"
login_page: Optional[str] = None # 登录页面creator 或 home为None时使用配置文件默认值
class VerifyCodeRequest(BaseModel):
phone: str
code: str
country_code: str = "+86"
class LoginRequest(BaseModel):
phone: str
code: str
country_code: str = "+86"
login_page: Optional[str] = None # 登录页面creator 或 home为None时使用配置文件默认值
session_id: Optional[str] = None # 可选复用send-code接口的session_id
class PublishNoteRequest(BaseModel):
title: str
@@ -44,8 +116,20 @@ class PublishNoteRequest(BaseModel):
images: Optional[list] = None
topics: Optional[list] = None
class PublishWithCookiesRequest(BaseModel):
cookies: Optional[list] = None # 兼容旧版仅传Cookies
login_state: Optional[dict] = None # 新版传完整的login_state
storage_state_path: Optional[str] = None # 新增storage_state文件路径最优先
phone: Optional[str] = None # 新增手机号用于查找storage_state文件
title: str
content: str
images: Optional[list] = None
topics: Optional[list] = None
class InjectCookiesRequest(BaseModel):
cookies: list
cookies: Optional[list] = None # 兼容旧版仅传Cookies
login_state: Optional[dict] = None # 新版传完整的login_state
target_page: Optional[str] = "creator" # 目标页面creator 或 home
# 响应模型
class BaseResponse(BaseModel):
@@ -55,32 +139,241 @@ class BaseResponse(BaseModel):
@app.on_event("startup")
async def startup_event():
"""启动时不初始化浏览器,等待第一次请求时再初始化"""
pass
"""启动时启动后台清理任务和定时发布任务(已禁用预热)"""
# 初始化配置从ENV环境变量读取默认dev
config = init_config()
print("[服务启动] FastAPI服务启动浏览器池已就绪")
# 清理旧的错误截图保留最近7天
try:
cleanup_old_screenshots(days=7)
except Exception as e:
print(f"[启动] 清理旧截图失败: {str(e)}")
# 从配置文件读取headless参数
headless = config.get_bool('scheduler.headless', True) # 定时发布的headless配置
login_headless = config.get_bool('login.headless', False) # 登录/绑定的headless配置默认为有头模式
login_page = config.get_str('login.page', 'creator') # 登录页面类型,默认为创作者中心
# 根据配置自动调整预热URL
if login_page == "home":
preheat_url = "https://www.xiaohongshu.com"
else:
preheat_url = "https://creator.xiaohongshu.com/login"
# 初始化全局浏览器池使用配置的headless参数
global browser_pool, login_service, sms_service
browser_pool = get_browser_pool(idle_timeout=1800, headless=headless)
print(f"[服务启动] 浏览器池模式: {'headless(无头模式)' if headless else 'headed(有头模式)'}")
# 初始化登录服务使用独立的login.headless配置
login_service = XHSLoginService(use_pool=True, headless=login_headless)
print(f"[服务启动] 登录服务模式: {'headless(无头模式)' if login_headless else 'headed(有头模式)'}")
# 初始化阿里云短信服务
sms_dict = config.get_dict('ali_sms')
sms_service = AliSmsService(
access_key_id=sms_dict.get('access_key_id', ''),
access_key_secret=sms_dict.get('access_key_secret', ''),
sign_name=sms_dict.get('sign_name', ''),
template_code=sms_dict.get('template_code', '')
)
print("[服务启动] 阿里云短信服务已初始化")
# 启动浏览器池清理任务
asyncio.create_task(browser_cleanup_task())
# 已禁用预热功能,避免干扰正常业务流程
# asyncio.create_task(browser_preheat_task())
print("[服务启动] 浏览器预热功能已禁用")
# 启动定时发布任务
global scheduler
# 从配置文件读取数据库配置
db_dict = config.get_dict('database')
db_config = {
'host': db_dict.get('host', 'localhost'),
'port': db_dict.get('port', 3306),
'user': db_dict.get('username', 'root'),
'password': db_dict.get('password', ''),
'database': db_dict.get('dbname', 'ai_wht')
}
# 从配置文件读取调度器配置
scheduler_enabled = config.get_bool('scheduler.enabled', False)
proxy_pool_enabled = config.get_bool('proxy_pool.enabled', False)
proxy_pool_api_url = config.get_str('proxy_pool.api_url', '')
enable_random_ua = config.get_bool('scheduler.enable_random_ua', True)
min_publish_interval = config.get_int('scheduler.min_publish_interval', 30)
max_publish_interval = config.get_int('scheduler.max_publish_interval', 120)
# headless已经在上面读取了
if scheduler_enabled:
scheduler = XHSScheduler(
db_config=db_config,
max_concurrent=config.get_int('scheduler.max_concurrent', 2),
publish_timeout=config.get_int('scheduler.publish_timeout', 300),
max_articles_per_user_per_run=config.get_int('scheduler.max_articles_per_user_per_run', 2),
max_failures_per_user_per_run=config.get_int('scheduler.max_failures_per_user_per_run', 3),
max_daily_articles_per_user=config.get_int('scheduler.max_daily_articles_per_user', 6),
max_hourly_articles_per_user=config.get_int('scheduler.max_hourly_articles_per_user', 2),
proxy_pool_enabled=proxy_pool_enabled,
proxy_pool_api_url=proxy_pool_api_url,
enable_random_ua=enable_random_ua,
min_publish_interval=min_publish_interval,
max_publish_interval=max_publish_interval,
headless=headless, # 新增: 传递headless参数
)
cron_expr = config.get_str('scheduler.cron', '*/5 * * * * *')
scheduler.start(cron_expr)
print(f"[服务启动] 定时发布任务已启动Cron: {cron_expr}")
else:
print("[服务启动] 定时发布任务未启用")
async def browser_cleanup_task():
"""后台任务:定期清理空闲浏览器"""
while True:
await asyncio.sleep(300) # 每5分钟检查一次
try:
await browser_pool.cleanup_if_idle()
except Exception as e:
print(f"[清理任务] 浏览器清理异常: {str(e)}")
async def browser_preheat_task():
"""后台任务:预热浏览器"""
try:
# 延迟3秒启动避免影响服务启动速度
await asyncio.sleep(3)
print("[预热任务] 开始预热浏览器...")
await browser_pool.preheat("https://creator.xiaohongshu.com/login")
except Exception as e:
print(f"[预热任务] 预热失败: {str(e)}")
async def repreheat_browser_after_use():
"""后台任务:使用后补充预热浏览器(仅用于登录流程)"""
try:
# 延迟5秒确保
# 1. 响应已经返回给用户
# 2. Cookie已经完全获取并保存
# 3. 登录流程完全结束
await asyncio.sleep(5)
print("[补充预热任务] 开始补充预热浏览器...")
await browser_pool.repreheat("https://creator.xiaohongshu.com/login")
except Exception as e:
print(f"[补充预热任务] 补充预热失败: {str(e)}")
@app.on_event("shutdown")
async def shutdown_event():
"""关闭时清理浏览器"""
await login_service.close_browser()
"""关闭时清理浏览器池和停止调度器"""
print("[服务关闭] 正在关闭服务...")
# 停止调度器
global scheduler
if scheduler:
scheduler.stop()
print("[服务关闭] 调度器已停止")
# 关闭浏览器池
await browser_pool.close()
print("[服务关闭] 浏览器池已关闭")
@app.post("/api/xhs/send-code", response_model=BaseResponse)
async def send_code(request: SendCodeRequest):
"""
发送验证码
通过playwright访问小红书官网输入手机号并触发验证码发送
支持选择从创作者中心或小红书首页登录
并发支持:为每个请求分配独立的浏览器实例
"""
# 使用手机号作为session_id确保发送验证码和登录验证使用同一个浏览器
session_id = f"xhs_login_{request.phone}"
print(f"[发送验证码] session_id={session_id}, phone={request.phone}", file=sys.stderr)
# 获取配置中的默认login_page如果API传入了则优先使用API参数
config = get_config()
default_login_page = config.get_str('login.page', 'creator')
login_page = request.login_page if request.login_page else default_login_page
print(f"[发送验证码] 使用登录页面: {login_page} (配置默认={default_login_page}, API参数={request.login_page})", file=sys.stderr)
try:
# 为此请求创建独立的登录服务实例使用session_id实现并发隔离
request_login_service = XHSLoginService(
use_pool=True,
headless=login_service.headless, # 使用配置文件中的login.headless配置
session_id=session_id # 关键传递session_id
)
# 调用登录服务发送验证码
result = await login_service.send_verification_code(
result = await request_login_service.send_verification_code(
phone=request.phone,
country_code=request.country_code
country_code=request.country_code,
login_page=login_page # 传递登录页面参数
)
if result["success"]:
return BaseResponse(
code=0,
message="验证码已发送请在小红书APP中查看",
data={"sent_at": datetime.now().isoformat()}
data={
"sent_at": datetime.now().isoformat(),
"session_id": session_id # 返回session_id供前端使用
}
)
else:
# 发送失败,释放临时浏览器
if session_id and browser_pool:
try:
await browser_pool.release_temp_browser(session_id)
print(f"[发送验证码] 已释放临时浏览器: {session_id}", file=sys.stderr)
except Exception as e:
print(f"[发送验证码] 释放临时浏览器失败: {str(e)}", file=sys.stderr)
return BaseResponse(
code=1,
message=result.get("error", "发送验证码失败"),
data=None
)
except Exception as e:
print(f"发送验证码异常: {str(e)}", file=sys.stderr)
# 异常情况,释放临时浏览器
if session_id and browser_pool:
try:
await browser_pool.release_temp_browser(session_id)
print(f"[发送验证码] 已释放临时浏览器: {session_id}", file=sys.stderr)
except Exception as release_error:
print(f"[发送验证码] 释放临时浏览器失败: {str(release_error)}", file=sys.stderr)
return BaseResponse(
code=1,
message=f"发送验证码失败: {str(e)}",
data=None
)
@app.post("/api/xhs/phone/send-code", response_model=BaseResponse)
async def send_phone_code(request: SendCodeRequest):
"""
发送手机短信验证码(使用阿里云短信服务)
用于小红书手机号验证码登录
"""
try:
# 调用阿里云短信服务发送验证码
result = await sms_service.send_verification_code(request.phone)
if result["success"]:
return BaseResponse(
code=0,
message=result.get("message", "验证码已发送"),
data={
"sent_at": datetime.now().isoformat(),
# 开发环境返回验证码,生产环境应移除
"code": result.get("code") if get_config().get_bool('server.debug', False) else None
}
)
else:
return BaseResponse(
@@ -90,28 +383,104 @@ async def send_code(request: SendCodeRequest):
)
except Exception as e:
print(f"发送验证码异常: {str(e)}")
print(f"发送短信验证码异常: {str(e)}")
return BaseResponse(
code=1,
message=f"发送验证码失败: {str(e)}",
data=None
)
@app.post("/api/xhs/phone/verify-code", response_model=BaseResponse)
async def verify_phone_code(request: VerifyCodeRequest):
"""
验证手机短信验证码
用于小红书手机号验证码登录
"""
try:
# 调用阿里云短信服务验证验证码
result = sms_service.verify_code(request.phone, request.code)
if result["success"]:
return BaseResponse(
code=0,
message="验证码验证成功",
data={"verified_at": datetime.now().isoformat()}
)
else:
return BaseResponse(
code=1,
message=result.get("error", "验证码验证失败"),
data=None
)
except Exception as e:
print(f"验证验证码异常: {str(e)}")
return BaseResponse(
code=1,
message=f"验证失败: {str(e)}",
data=None
)
@app.post("/api/xhs/login", response_model=BaseResponse)
async def login(request: LoginRequest):
"""
登录验证
用户填写验证码后,完成登录并获取小红书返回的数据
支持选择从创作者中心或小红书首页登录
并发支持可复用send-code接口的session_id
"""
# 使用手机号作为session_id复用发送验证码时的浏览器
# 如果前端传了session_id就使用前端的否则根据手机号生成
if not request.session_id:
session_id = f"xhs_login_{request.phone}"
else:
session_id = request.session_id
print(f"[登录验证] session_id={session_id}, phone={request.phone}", file=sys.stderr)
# 获取配置中的默认login_page如果API传入了则优先使用API参数
config = get_config()
default_login_page = config.get_str('login.page', 'creator')
login_page = request.login_page if request.login_page else default_login_page
print(f"[登录验证] 使用登录页面: {login_page} (配置默认={default_login_page}, API参数={request.login_page})", file=sys.stderr)
try:
# 如果有session_id复用send-code的浏览器否则创建新的
if session_id:
print(f"[登录验证] 复用send-code的浏览器: {session_id}", file=sys.stderr)
request_login_service = XHSLoginService(
use_pool=True,
headless=login_service.headless, # 使用配置文件中的login.headless配置
session_id=session_id
)
# 初始化浏览器,以便从浏览器池获取临时浏览器
await request_login_service.init_browser()
else:
# 旧逻辑不传session_id使用全局登录服务
print(f"[登录验证] 使用全局登录服务(旧逻辑)", file=sys.stderr)
request_login_service = login_service
# 调用登录服务进行登录
result = await login_service.login(
result = await request_login_service.login(
phone=request.phone,
code=request.code,
country_code=request.country_code
country_code=request.country_code,
login_page=login_page # 传递登录页面参数
)
# 释放临时浏览器(无论成功还是失败)
if session_id and browser_pool:
try:
await browser_pool.release_temp_browser(session_id)
print(f"[登录验证] 已释放临时浏览器: {session_id}", file=sys.stderr)
except Exception as e:
print(f"[登录验证] 释放临时浏览器失败: {str(e)}", file=sys.stderr)
if result["success"]:
# 登录成功,不再触发预热(已禁用预热功能)
# asyncio.create_task(repreheat_browser_after_use())
return BaseResponse(
code=0,
message="登录成功",
@@ -119,10 +488,16 @@ async def login(request: LoginRequest):
"user_info": result.get("user_info"),
"cookies": result.get("cookies"), # 键值对格式(前端展示)
"cookies_full": result.get("cookies_full"), # Playwright完整格式数据库存储/脚本使用)
"login_state": result.get("login_state"), # 完整登录状态包含cookies + localStorage + sessionStorage
"localStorage": result.get("localStorage"), # localStorage数据
"sessionStorage": result.get("sessionStorage"), # sessionStorage数据
"url": result.get("url"), # 当前URL
"storage_state_path": result.get("storage_state_path"), # storage_state文件路径
"login_time": datetime.now().isoformat()
}
)
else:
# 登录失败
return BaseResponse(
code=1,
message=result.get("error", "登录失败"),
@@ -130,7 +505,16 @@ async def login(request: LoginRequest):
)
except Exception as e:
print(f"登录异常: {str(e)}")
print(f"登录异常: {str(e)}", file=sys.stderr)
# 异常情况,释放临时浏览器
if session_id and browser_pool:
try:
await browser_pool.release_temp_browser(session_id)
print(f"[登录验证] 已释放临时浏览器: {session_id}", file=sys.stderr)
except Exception as release_error:
print(f"[登录验证] 释放临时浏览器失败: {str(release_error)}", file=sys.stderr)
return BaseResponse(
code=1,
message=f"登录失败: {str(e)}",
@@ -140,31 +524,99 @@ async def login(request: LoginRequest):
@app.get("/")
async def root():
"""健康检查"""
return {"status": "ok", "message": "小红书登录服务运行中"}
if browser_pool:
stats = browser_pool.get_stats()
return {
"status": "ok",
"message": "小红书登录服务运行中(浏览器池模式)",
"browser_pool": stats
}
return {"status": "ok", "message": "服务初始化中..."}
@app.get("/api/health")
async def health_check():
"""健康检查接口(详细)"""
if browser_pool:
stats = browser_pool.get_stats()
return {
"status": "healthy",
"service": "xhs-login-service",
"mode": "browser-pool",
"browser_pool_stats": stats,
"timestamp": datetime.now().isoformat()
}
return {
"status": "initializing",
"service": "xhs-login-service",
"timestamp": datetime.now().isoformat()
}
@app.post("/api/xhs/inject-cookies", response_model=BaseResponse)
async def inject_cookies(request: InjectCookiesRequest):
"""
注入Cookies并验证登录状态
允许使用之前保存的Cookies跳过登录
注入Cookies或完整登录状态并验证
支持两种模式:
1. 仅注入Cookies兼容旧版
2. 注入完整login_state包含Cookies + localStorage + sessionStorage
支持选择跳转到创作者中心或小红书首页
重要:为了避免检测,不使用浏览器池,每次创建全新的浏览器实例
"""
try:
# 关闭旧的浏览器(如果有)
if login_service.browser:
await login_service.close_browser()
# 使用Cookies初始化浏览器
await login_service.init_browser(cookies=request.cookies)
# 创建一个独立的登录服务实例,不使用浏览器
print("✅ 为注入Cookie创建全新的浏览器实例不使用浏览器池", file=sys.stderr)
inject_service = XHSLoginService(use_pool=False, headless=False) # 不使用浏览器池,使用有头模式方便调试
# 验证登录状态
result = await login_service.verify_login_status()
# 优先使用login_state其次使用cookies
if request.login_state:
# 新版使用完整的login_state
print("✅ 检测到login_state将恢复完整登录状态", file=sys.stderr)
# 保存login_state到文件供 init_browser 加载
with open('login_state.json', 'w', encoding='utf-8') as f:
json.dump(request.login_state, f, ensure_ascii=False, indent=2)
# 使用restore_state=True恢复完整状态
await inject_service.init_browser(restore_state=True)
elif request.cookies:
# 兼容旧版仅使用Cookies
print("⚠️ 检测到仅有Cookies建议使用login_state获取更好的兼容性", file=sys.stderr)
await inject_service.init_browser(cookies=request.cookies)
else:
return BaseResponse(
code=1,
message="请提供 cookies 或 login_state",
data=None
)
# 根据target_page参数确定验证URL
target_page = request.target_page or "creator"
if target_page == "home":
verify_url = "https://www.xiaohongshu.com"
page_name = "小红书首页"
else:
verify_url = "https://creator.xiaohongshu.com"
page_name = "创作者中心"
# 访问目标页面并验证登录状态
result = await inject_service.verify_login_status(url=verify_url)
# 关闭独立的浏览器实例(注:因为不是池模式,会真正关闭)
# await inject_service.close_browser() # 先不关闭,让用户看到结果
if result.get("logged_in"):
return BaseResponse(
code=0,
message="Cookie注入成功已登录",
message=f"{'login_state' if request.login_state else 'Cookie'}注入成功,已跳转到{page_name}",
data={
"logged_in": True,
"target_page": page_name,
"user_info": result.get("user_info"),
"cookies": result.get("cookies"), # 键值对格式
"cookies_full": result.get("cookies_full"), # Playwright完整格式
@@ -172,37 +624,117 @@ async def inject_cookies(request: InjectCookiesRequest):
}
)
else:
# 失败时关闭浏览器
await inject_service.close_browser()
return BaseResponse(
code=1,
message=result.get("message", "Cookie已失效,请重新登录"),
message=result.get("message", "{'login_state' if request.login_state else 'Cookie'}已失效,请重新登录"),
data={
"logged_in": False
}
)
except Exception as e:
print(f"注入Cookies异常: {str(e)}")
print(f"注入失败: {str(e)}", file=sys.stderr)
import traceback
traceback.print_exc()
return BaseResponse(
code=1,
message=f"注入失败: {str(e)}",
data=None
)
@app.post("/api/xhs/publish", response_model=BaseResponse)
async def publish_note(request: PublishNoteRequest):
@app.post("/api/xhs/publish-with-cookies", response_model=BaseResponse)
async def publish_note_with_cookies(request: PublishWithCookiesRequest):
"""
发布笔记
登录后可以发布图文笔记到小红书
使用Cookies或完整login_state或storage_state发布笔记供Go后端定时任务调用
支持三种模式(按优先级):
1. 使用storage_state_path推荐最完整的登录状态
2. 传入完整login_state次选包含cookies + localStorage + sessionStorage
3. 仅传入Cookies兼容旧版
重要:为了避免检测,不使用浏览器池,每次创建全新的浏览器实例
"""
try:
# 调用登录服务发布笔记
result = await login_service.publish_note(
# 获取代理(如果启用)
proxy = await fetch_proxy_from_pool()
if proxy:
print(f"[发布接口] 使用代理: {proxy}", file=sys.stderr)
# 创建一个独立的登录服务实例,不使用浏览器池,应用所有反检测措施
print("✅ 为发布任务创建全新的浏览器实例,不使用浏览器池", file=sys.stderr)
# 从配置读取headless参数
config = get_config()
headless = config.get_bool('scheduler.headless', True)
publish_service = XHSLoginService(use_pool=False, headless=headless) # 不使用浏览器池
# 优先级判断storage_state_path > login_state > cookies
if request.storage_state_path or request.phone:
# 模式1使用storage_state最优先
storage_state_file = None
if request.storage_state_path:
# 直接指定了storage_state路径
storage_state_file = request.storage_state_path
elif request.phone:
# 根据手机号查找
storage_state_dir = 'storage_states'
storage_state_file = os.path.join(storage_state_dir, f"xhs_{request.phone}.json")
if storage_state_file and os.path.exists(storage_state_file):
print(f"✅ 检测到storage_state文件: {storage_state_file}将使用Playwright原生恢复", file=sys.stderr)
# 使用Playwright原生API恢复登录状态
await publish_service.init_browser_with_storage_state(
storage_state_path=storage_state_file,
proxy=proxy
)
else:
print(f"⚠️ storage_state文件不存在: {storage_state_file}回退到login_state或cookies模式", file=sys.stderr)
# 回退到旧模式
if request.login_state:
await _init_with_login_state(publish_service, request.login_state, proxy)
elif request.cookies:
await publish_service.init_browser(cookies=request.cookies, proxy=proxy)
else:
return BaseResponse(
code=1,
message="storage_state文件不存在且未提供 login_state 或 cookies",
data=None
)
elif request.login_state:
# 模式2使用login_state
print("✅ 检测到login_state将恢复完整登录状态", file=sys.stderr)
await _init_with_login_state(publish_service, request.login_state, proxy)
elif request.cookies:
# 模式3仅使用Cookies兼容旧版
print("⚠️ 检测到仅有Cookies建议使用storage_state或login_state获取更好的兼容性", file=sys.stderr)
await publish_service.init_browser(cookies=request.cookies, proxy=proxy)
else:
return BaseResponse(
code=1,
message="请提供 storage_state_path、phone、login_state 或 cookies",
data=None
)
# 调用发布方法使用已经初始化好的publish_service
result = await publish_service.publish_note(
title=request.title,
content=request.content,
images=request.images,
topics=request.topics
topics=request.topics,
cookies=None, # 已经注入,不需要再传
proxy=None, # 已经设置,不需要再传
)
# 关闭独立的浏览器实例
await publish_service.close_browser()
if result["success"]:
return BaseResponse(
code=0,
@@ -220,13 +752,55 @@ async def publish_note(request: PublishNoteRequest):
)
except Exception as e:
print(f"发布笔记异常: {str(e)}")
print(f"发布笔记异常: {str(e)}", file=sys.stderr)
import traceback
traceback.print_exc()
return BaseResponse(
code=1,
message=f"发布失败: {str(e)}",
data=None
)
async def _init_with_login_state(publish_service, login_state, proxy):
"""使用login_state初始化浏览器"""
# 保存login_state到临时文件
import tempfile
import uuid
temp_file = os.path.join(tempfile.gettempdir(), f"login_state_{uuid.uuid4()}.json")
with open(temp_file, 'w', encoding='utf-8') as f:
json.dump(login_state, f, ensure_ascii=False, indent=2)
# 使用restore_state=True恢复完整状态
await publish_service.init_browser(
cookies=login_state.get('cookies'),
proxy=proxy,
user_agent=login_state.get('user_agent')
)
# 恢夏localStorage和sessionStorage
try:
if login_state.get('localStorage') or login_state.get('sessionStorage'):
target_url = login_state.get('url', 'https://creator.xiaohongshu.com')
await publish_service.page.goto(target_url, wait_until='domcontentloaded', timeout=15000)
if login_state.get('localStorage'):
for key, value in login_state['localStorage'].items():
await publish_service.page.evaluate(f'localStorage.setItem("{key}", {json.dumps(value)})')
if login_state.get('sessionStorage'):
for key, value in login_state['sessionStorage'].items():
await publish_service.page.evaluate(f'sessionStorage.setItem("{key}", {json.dumps(value)})')
print("✅ 已恢夏localStorage和sessionStorage", file=sys.stderr)
except Exception as e:
print(f"⚠️ 恢夏storage失败: {str(e)}", file=sys.stderr)
# 清理临时文件
try:
os.remove(temp_file)
except:
pass
@app.post("/api/xhs/upload-images")
async def upload_images(files: List[UploadFile] = File(...)):
"""
@@ -279,4 +853,20 @@ async def upload_images(files: List[UploadFile] = File(...)):
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
# 从配置文件读取服务器配置
config = get_config()
host = config.get_str('server.host', '0.0.0.0')
port = config.get_int('server.port', 8000)
debug = config.get_bool('server.debug', False)
reload = config.get_bool('server.reload', False)
print(f"[\u542f\u52a8\u670d\u52a1] \u4e3b\u673a: {host}, \u7aef\u53e3: {port}, \u8c03\u8bd5: {debug}, \u70ed\u91cd\u8f7d: {reload}")
uvicorn.run(
app,
host=host,
port=port,
reload=reload,
log_level="debug" if debug else "info"
)

157
backend/oss_utils.py Normal file
View File

@@ -0,0 +1,157 @@
"""
阿里云OSS工具类
用于Python脚本中上传/下载文件到OSS
"""
import os
import oss2
from datetime import datetime
from typing import Optional
class OSSUploader:
"""OSS上传工具"""
def __init__(
self,
access_key_id: Optional[str] = None,
access_key_secret: Optional[str] = None,
bucket_name: Optional[str] = None,
endpoint: Optional[str] = None
):
"""
初始化OSS客户端
Args:
access_key_id: AccessKey ID可选默认从环境变量读取
access_key_secret: AccessKey Secret可选默认从环境变量读取
bucket_name: Bucket名称可选默认从环境变量读取
endpoint: OSS访问域名可选默认从环境变量读取
"""
# 使用提供的值或从环境变量读取
self.access_key_id = access_key_id or os.getenv('OSS_TEST_ACCESS_KEY_ID', 'LTAI5tNesdhDH4ErqEUZmEg2')
self.access_key_secret = access_key_secret or os.getenv('OSS_TEST_ACCESS_KEY_SECRET', 'xZn7WUkTW76TqOLTh01zZATnU6p3Tf')
self.bucket_name = bucket_name or os.getenv('OSS_TEST_BUCKET', 'bxmkb-beijing')
self.endpoint = endpoint or os.getenv('OSS_TEST_ENDPOINT', 'https://oss-cn-beijing.aliyuncs.com/')
# 移除endpoint中的协议前缀oss2库不需要https://
self.endpoint = self.endpoint.replace('https://', '').replace('http://', '')
# 创建认证对象
self.auth = oss2.Auth(self.access_key_id, self.access_key_secret)
# 创建Bucket对象
self.bucket = oss2.Bucket(self.auth, self.endpoint, self.bucket_name)
# 基础路径
self.base_path = "wht/"
def upload_file(self, local_file_path: str, object_name: Optional[str] = None) -> str:
"""
上传文件到OSS
Args:
local_file_path: 本地文件路径
object_name: OSS对象名称可选默认自动生成
Returns:
OSS文件的完整URL
"""
# 如果未指定对象名称,自动生成
if object_name is None:
# 生成格式: wht/YYYYMMDD/timestamp_filename.ext
now = datetime.now()
date_dir = now.strftime("%Y%m%d")
timestamp = int(now.timestamp())
filename = os.path.basename(local_file_path)
object_name = f"{self.base_path}{date_dir}/{timestamp}_{filename}"
# 上传文件
self.bucket.put_object_from_file(object_name, local_file_path)
# 生成访问URL
url = f"https://{self.bucket_name}.{self.endpoint}/{object_name}"
return url
def upload_bytes(self, data: bytes, filename: str) -> str:
"""
上传字节数据到OSS
Args:
data: 文件字节数据
filename: 文件名(用于生成扩展名)
Returns:
OSS文件的完整URL
"""
# 生成对象名称
now = datetime.now()
date_dir = now.strftime("%Y%m%d")
timestamp = int(now.timestamp())
# 获取扩展名
ext = os.path.splitext(filename)[1] or '.jpg'
object_name = f"{self.base_path}{date_dir}/{timestamp}_{filename}"
# 上传数据
self.bucket.put_object(object_name, data)
# 生成访问URL
url = f"https://{self.bucket_name}.{self.endpoint}/{object_name}"
return url
def delete_file(self, file_url: str) -> bool:
"""
从OSS删除文件
Args:
file_url: OSS文件的完整URL
Returns:
是否删除成功
"""
try:
# 从URL中提取对象名称
# 格式: https://bucket.endpoint/path/file.jpg
prefix = f"https://{self.bucket_name}.{self.endpoint}/"
if file_url.startswith(prefix):
object_name = file_url[len(prefix):]
self.bucket.delete_object(object_name)
return True
else:
return False
except Exception as e:
print(f"删除OSS文件失败: {e}")
return False
def file_exists(self, file_url: str) -> bool:
"""
检查OSS文件是否存在
Args:
file_url: OSS文件的完整URL
Returns:
文件是否存在
"""
try:
prefix = f"https://{self.bucket_name}.{self.endpoint}/"
if file_url.startswith(prefix):
object_name = file_url[len(prefix):]
return self.bucket.object_exists(object_name)
else:
return False
except Exception:
return False
# 创建默认实例(使用环境变量配置)
default_uploader = None
def get_oss_uploader() -> OSSUploader:
"""获取默认的OSS上传器实例"""
global default_uploader
if default_uploader is None:
default_uploader = OSSUploader()
return default_uploader

View File

@@ -0,0 +1,66 @@
"""
固定代理IP测试总结报告
"""
print("="*60)
print("🎯 固定代理IP测试总结报告")
print("="*60)
print("\n📋 测试概览:")
print(" • 测试项目: 固定代理IP在小红书登录发文功能中的可用性")
print(" • 测试时间: 2025年12月26日")
print(" • 测试环境: Windows 10, Python虚拟环境")
print(" • 测试代理数量: 2个")
print("\n✅ 代理IP详细信息:")
print(" 代理1:")
print(" - 服务器: http://36.137.177.131:50001")
print(" - 用户名: qqwvy0")
print(" - 密码: mun3r7xz")
print(" - 状态: ✅ 可用")
print("")
print(" 代理2:")
print(" - 服务器: http://111.132.40.72:50002")
print(" - 用户名: ih3z07")
print(" - 密码: 078bt7o5")
print(" - 状态: ✅ 可用")
print("\n🧪 测试项目及结果:")
print(" 1. requests库连接测试:")
print(" - 代理1: ✅ 成功")
print(" - 代理2: ✅ 成功")
print(" - 结论: 代理IP基础连接正常")
print("")
print(" 2. Playwright浏览器代理测试:")
print(" - 代理1: ✅ 成功 (可访问小红书创作者平台)")
print(" - 代理2: ✅ 成功 (可访问小红书创作者平台)")
print(" - 结论: 代理IP在浏览器环境中正常工作")
print("")
print(" 3. 网站访问能力测试:")
print(" - 百度访问: ✅ 成功")
print(" - IP检测网站: ✅ 成功")
print(" - 小红书创作者平台: ✅ 成功")
print(" - 结论: 代理IP未被目标网站封禁")
print("\n📊 测试结果汇总:")
print(" • 总体成功率: 100% (2/2 个代理可用)")
print(" • 网络延迟: 良好")
print(" • 稳定性: 良好")
print(" • 适用场景: 小红书登录及发文功能")
print("\n🔧 Playwright代理格式:")
print(" 代理1格式: http://qqwvy0:mun3r7xz@36.137.177.131:50001")
print(" 代理2格式: http://ih3z07:078bt7o5@111.132.40.72:50002")
print("\n💡 使用建议:")
print(" 1. 在小红书自动化脚本中可以使用以上两个代理IP")
print(" 2. 建议轮换使用两个代理以提高稳定性")
print(" 3. 如遇到访问限制可尝试调整User-Agent或请求间隔")
print(" 4. 代理IP可以有效隐藏真实IP降低被封禁风险")
print("\n🎉 总结:")
print(" 两个固定代理IP均可以正常用于小红书登录发文功能")
print(" 网络连接稳定,未检测到访问限制或验证码拦截。")
print("\n" + "="*60)
print("报告生成完成")
print("="*60)

View File

@@ -0,0 +1,230 @@
"""
固定代理IP下小红书登录和发文功能示例
展示如何在实际应用中使用代理IP进行小红书操作
"""
import asyncio
import json
import sys
from xhs_login import XHSLoginService
from xhs_publish import XHSPublishService
from damai_proxy_config import get_proxy_config
async def login_with_proxy(phone: str, code: str, proxy_index: int = 0):
"""
使用代理进行小红书登录
Args:
phone: 手机号
code: 验证码
proxy_index: 代理索引 (0 或 1)
"""
print(f"\n{'='*60}")
print(f"📱 使用代理登录小红书")
print(f"{'='*60}")
# 获取代理配置
proxy_config = get_proxy_config(proxy_index)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f"✅ 使用代理: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
# 创建登录服务
login_service = XHSLoginService()
try:
# 初始化浏览器(使用代理)
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
await login_service.init_browser(proxy=proxy_url, user_agent=user_agent)
print("✅ 浏览器初始化成功(已启用代理)")
# 执行登录
result = await login_service.login(phone, code)
if result.get('success'):
print("✅ 登录成功!")
# 保存Cookies到文件
cookies_full = result.get('cookies_full', [])
if cookies_full:
with open('cookies_proxy.json', 'w', encoding='utf-8') as f:
json.dump(cookies_full, f, ensure_ascii=False, indent=2)
print("✅ 已保存登录Cookies到 cookies_proxy.json")
return result
else:
print(f"❌ 登录失败: {result.get('error')}")
return result
except Exception as e:
print(f"❌ 登录过程异常: {str(e)}")
import traceback
traceback.print_exc()
return {"success": False, "error": str(e)}
finally:
await login_service.close_browser()
async def publish_with_proxy(title: str, content: str, images: list = None, tags: list = None, proxy_index: int = 0, cookies_file: str = 'cookies.json'):
"""
使用代理发布小红书笔记
Args:
title: 笔记标题
content: 笔记内容
images: 图片路径列表
tags: 标签列表
proxy_index: 代理索引 (0 或 1)
cookies_file: Cookies文件路径
"""
print(f"\n{'='*60}")
print(f"📝 使用代理发布小红书笔记")
print(f"{'='*60}")
# 获取代理配置
proxy_config = get_proxy_config(proxy_index)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f"✅ 使用代理: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
# 读取Cookies
try:
with open(cookies_file, 'r', encoding='utf-8') as f:
cookies = json.load(f)
print(f"✅ 成功读取Cookies: {len(cookies)}")
except FileNotFoundError:
print(f"❌ Cookies文件不存在: {cookies_file}")
return {"success": False, "error": f"Cookies文件不存在: {cookies_file}"}
except Exception as e:
print(f"❌ 读取Cookies失败: {str(e)}")
return {"success": False, "error": str(e)}
# 准备发布数据
images = images or []
tags = tags or []
print(f"📝 发布内容:")
print(f" 标题: {title}")
print(f" 内容: {content[:50]}...") # 只显示前50个字符
print(f" 图片: {len(images)}")
print(f" 标签: {tags}")
# 创建发布服务
try:
publisher = XHSPublishService(cookies, proxy=proxy_url)
# 执行发布
result = await publisher.publish(
title=title,
content=content,
images=images,
tags=tags
)
if result.get('success'):
print("✅ 发布成功!")
else:
print(f"❌ 发布失败: {result.get('error')}")
return result
except Exception as e:
print(f"❌ 发布过程异常: {str(e)}")
import traceback
traceback.print_exc()
return {"success": False, "error": str(e)}
async def test_proxy_functionality():
"""测试代理功能的完整流程"""
print("🚀 开始测试代理功能完整流程")
# 1. 测试代理连接
print(f"\n{'-'*40}")
print("1. 测试代理连接...")
for i in range(2):
proxy_config = get_proxy_config(i)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f" 代理{i+1}: {proxy_config['server']} - {proxy_url}")
# 2. 演示如何使用代理登录(仅展示,不实际执行)
print(f"\n{'-'*40}")
print("2. 代理登录示例(代码演示)...")
print("""
# 登录示例代码:
async def example_login():
result = await login_with_proxy(
phone="你的手机号", # 实际手机号
code="验证码", # 实际验证码
proxy_index=0 # 使用代理1
)
return result
""")
# 3. 演示如何使用代理发布(仅展示,不实际执行)
print(f"\n{'-'*40}")
print("3. 代理发布示例(代码演示)...")
print("""
# 发布示例代码:
async def example_publish():
result = await publish_with_proxy(
title="测试标题",
content="测试内容",
images=["图片路径1", "图片路径2"], # 可选
tags=["标签1", "标签2"], # 可选
proxy_index=1, # 使用代理2
cookies_file="cookies.json" # Cookies文件路径
)
return result
""")
# 4. 代理轮换策略
print(f"\n{'-'*40}")
print("4. 代理轮换策略...")
print("""
# 代理轮换示例:
class ProxyManager:
def __init__(self):
self.current_proxy = 0
def get_next_proxy(self):
proxy_config = get_proxy_config(self.current_proxy)
self.current_proxy = (self.current_proxy + 1) % 2 # 循环使用两个代理
return proxy_config
""")
print(f"\n{'-'*40}")
print("✅ 代理功能演示完成!")
def main():
"""主函数"""
print("="*60)
print("🎯 固定代理IP下小红书登录发文功能示例")
print("="*60)
# 运行测试
asyncio.run(test_proxy_functionality())
print(f"\n{'='*60}")
print("💡 使用说明:")
print(" 1. 使用 login_with_proxy() 函数进行带代理的登录")
print(" 2. 使用 publish_with_proxy() 函数进行带代理的发布")
print(" 3. 可以轮换使用两个代理IP以提高稳定性")
print(" 4. 代理配置在 damai_proxy_config.py 中管理")
print("="*60)
if __name__ == "__main__":
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
main()

65
backend/rebuild_venv.bat Normal file
View File

@@ -0,0 +1,65 @@
@echo off
chcp 65001 >nul
echo ========================================
echo 重建虚拟环境使用标准Python
echo ========================================
echo.
cd /d %~dp0
echo [步骤1] 删除旧的虚拟环境...
if exist venv (
rmdir /s /q venv
echo [完成] 旧虚拟环境已删除
) else (
echo [提示] 没有找到旧虚拟环境
)
echo.
echo [步骤2] 使用标准Python 3.12创建新虚拟环境...
py -3.12 -m venv venv
if %errorlevel% neq 0 (
echo [错误] 虚拟环境创建失败
pause
exit /b 1
)
echo [完成] 虚拟环境创建成功
echo.
echo [步骤3] 验证新虚拟环境的Python路径...
venv\Scripts\python.exe -c "import sys; print('Python可执行文件:', sys.executable); print('Python版本:', sys.version); print('\nsys.path前5行:'); [print(p) for i, p in enumerate(sys.path[:5])]"
echo.
echo [步骤4] 升级pip...
venv\Scripts\python.exe -m pip install --upgrade pip -i https://pypi.tuna.tsinghua.edu.cn/simple
echo.
echo [步骤5] 配置pip使用清华镜像...
venv\Scripts\pip.exe config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
echo [完成] pip镜像源已配置
echo.
echo [步骤6] 安装项目依赖...
venv\Scripts\pip.exe install -r requirements.txt
if %errorlevel% neq 0 (
echo [错误] 依赖安装失败
pause
exit /b 1
)
echo [完成] 依赖安装成功
echo.
echo [步骤7] 安装Playwright浏览器...
venv\Scripts\playwright.exe install chromium
if %errorlevel% neq 0 (
echo [警告] Playwright浏览器安装可能失败请手动检查
)
echo.
echo ========================================
echo 虚拟环境重建完成!
echo ========================================
echo.
echo 现在可以运行 start_service.bat 启动服务
echo.
pause

View File

@@ -4,3 +4,12 @@ playwright==1.40.0
pydantic==2.5.0
python-multipart==0.0.6
aiohttp==3.9.1
oss2==2.18.4
APScheduler==3.10.4
PyMySQL==1.1.0
python-dotenv==1.0.0
PyYAML==6.0.1
alibabacloud_dysmsapi20170525==2.0.24
alibabacloud_credentials==0.3.4
alibabacloud_tea_openapi==0.3.9
alibabacloud_tea_util==0.3.13

563
backend/scheduler.py Normal file
View File

@@ -0,0 +1,563 @@
"""
小红书定时发布调度器
管理自动发布任务的调度和执行
"""
import asyncio
import sys
import random
from datetime import datetime, time as dt_time
from typing import List, Dict, Any, Optional
from apscheduler.schedulers.asyncio import AsyncIOScheduler
from apscheduler.triggers.cron import CronTrigger
import pymysql
import json
import aiohttp
from xhs_login import XHSLoginService
class XHSScheduler:
"""小红书定时发布调度器"""
def __init__(self,
db_config: Dict[str, Any],
max_concurrent: int = 2,
publish_timeout: int = 300,
max_articles_per_user_per_run: int = 5,
max_failures_per_user_per_run: int = 3,
max_daily_articles_per_user: int = 20,
max_hourly_articles_per_user: int = 3,
proxy_pool_enabled: bool = False,
proxy_pool_api_url: Optional[str] = None,
enable_random_ua: bool = True,
min_publish_interval: int = 30,
max_publish_interval: int = 120,
headless: bool = True):
"""
初始化调度器
Args:
db_config: 数据库配置
max_concurrent: 最大并发发布数
publish_timeout: 发布超时时间(秒)
max_articles_per_user_per_run: 每轮每用户最大发文数
max_failures_per_user_per_run: 每轮每用户最大失败次数
max_daily_articles_per_user: 每用户每日最大发文数
max_hourly_articles_per_user: 每用户每小时最大发文数
enable_random_ua: 是否启用随机User-Agent
min_publish_interval: 最小发布间隔(秒)
max_publish_interval: 最大发布间隔(秒)
headless: 是否使用无头模式False为有头模式方便调试
"""
self.db_config = db_config
self.max_concurrent = max_concurrent
self.publish_timeout = publish_timeout
self.max_articles_per_user_per_run = max_articles_per_user_per_run
self.max_failures_per_user_per_run = max_failures_per_user_per_run
self.max_daily_articles_per_user = max_daily_articles_per_user
self.max_hourly_articles_per_user = max_hourly_articles_per_user
self.proxy_pool_enabled = proxy_pool_enabled
self.proxy_pool_api_url = proxy_pool_api_url or ""
self.enable_random_ua = enable_random_ua
self.min_publish_interval = min_publish_interval
self.max_publish_interval = max_publish_interval
self.headless = headless
self.scheduler = AsyncIOScheduler()
self.login_service = XHSLoginService(use_pool=True, headless=headless)
self.semaphore = asyncio.Semaphore(max_concurrent)
print(f"[调度器] 已创建,最大并发: {max_concurrent}", file=sys.stderr)
def start(self, cron_expr: str = "*/5 * * * * *"):
"""
启动定时任务
Args:
cron_expr: Cron表达式默认每5秒执行一次
格式: 秒 分 时 日 月 周
"""
# 解析cron表达式
parts = cron_expr.split()
if len(parts) == 6:
# 6位格式: 秒 分 时 日 月 周
trigger = CronTrigger(
second=parts[0],
minute=parts[1],
hour=parts[2],
day=parts[3],
month=parts[4],
day_of_week=parts[5]
)
else:
print(f"[调度器] ⚠️ Cron表达式格式错误: {cron_expr},使用默认配置", file=sys.stderr)
trigger = CronTrigger(second="*/5")
self.scheduler.add_job(
self.auto_publish_articles,
trigger=trigger,
id='xhs_publish',
name='小红书自动发布',
max_instances=1, # 最多只允许1个实例同时运行防止重复执行
replace_existing=True # 如果任务已存在则替换,避免重启时重复添加
)
self.scheduler.start()
print(f"[调度器] 定时发布任务已启动Cron表达式: {cron_expr}", file=sys.stderr)
def stop(self):
"""停止定时任务"""
self.scheduler.shutdown()
print("[调度器] 定时发布任务已停止", file=sys.stderr)
def get_db_connection(self):
"""获取数据库连接"""
return pymysql.connect(
host=self.db_config['host'],
port=self.db_config['port'],
user=self.db_config['user'],
password=self.db_config['password'],
database=self.db_config['database'],
charset='utf8mb4',
cursorclass=pymysql.cursors.DictCursor
)
async def _fetch_proxy_from_pool(self) -> Optional[str]:
"""从代理池接口获取一个代理地址http://ip:port"""
if not self.proxy_pool_enabled or not self.proxy_pool_api_url:
return None
try:
timeout = aiohttp.ClientTimeout(total=10)
async with aiohttp.ClientSession(timeout=timeout) as session:
async with session.get(self.proxy_pool_api_url) as resp:
if resp.status != 200:
print(f"[调度器] 代理池接口返回非200状态码: {resp.status}", file=sys.stderr)
return None
text = (await resp.text()).strip()
if not text:
print("[调度器] 代理池返回内容为空", file=sys.stderr)
return None
line = text.splitlines()[0].strip()
if not line:
print("[调度器] 代理池首行内容为空", file=sys.stderr)
return None
if line.startswith("http://") or line.startswith("https://"):
return line
return "http://" + line
except Exception as e:
print(f"[调度器] 请求代理池接口失败: {str(e)}", file=sys.stderr)
return None
def _generate_random_user_agent(self) -> str:
"""生成随机User-Agent防止浏览器指纹识别"""
chrome_versions = ['120.0.0.0', '119.0.0.0', '118.0.0.0', '117.0.0.0', '116.0.0.0']
windows_versions = ['Windows NT 10.0; Win64; x64', 'Windows NT 11.0; Win64; x64']
chrome_ver = random.choice(chrome_versions)
win_ver = random.choice(windows_versions)
return f'Mozilla/5.0 ({win_ver}) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/{chrome_ver} Safari/537.36'
async def auto_publish_articles(self):
"""自动发布文案(定时任务主函数)"""
print("========== 开始执行定时发布任务 ==========", file=sys.stderr)
start_time = datetime.now()
try:
conn = self.get_db_connection()
cursor = conn.cursor()
# 1. 查询所有待发布的文案
cursor.execute("""
SELECT * FROM ai_articles
WHERE status = 'published_review'
ORDER BY id ASC
""")
articles = cursor.fetchall()
if not articles:
print("没有待发布的文案", file=sys.stderr)
cursor.close()
conn.close()
return
original_total = len(articles)
# 2. 限制每用户每轮发文数
articles = self._limit_articles_per_user(articles, self.max_articles_per_user_per_run)
print(f"找到 {original_total} 篇待发布文案,按照每个用户每轮最多 {self.max_articles_per_user_per_run} 篇,本次计划发布 {len(articles)}", file=sys.stderr)
# 3. 应用每日/每小时上限过滤
if self.max_daily_articles_per_user > 0 or self.max_hourly_articles_per_user > 0:
before_count = len(articles)
articles = await self._filter_by_daily_and_hourly_limit(
cursor, articles,
self.max_daily_articles_per_user,
self.max_hourly_articles_per_user
)
print(f"应用每日/每小时上限过滤:过滤前 {before_count} 篇,过滤后 {len(articles)}", file=sys.stderr)
if not articles:
print("所有文案均因频率限制被过滤,本轮无任务", file=sys.stderr)
cursor.close()
conn.close()
return
# 4. 并发发布
tasks = []
user_fail_count = {}
paused_users = set()
for article in articles:
user_id = article['publish_user_id'] or article['created_user_id']
if user_id in paused_users:
print(f"用户 {user_id} 在本轮已暂停,跳过文案 ID: {article['id']}", file=sys.stderr)
continue
# 直接发布,不在这里延迟
task = asyncio.create_task(
self._publish_article_with_semaphore(
article, user_id, cursor, user_fail_count, paused_users
)
)
tasks.append(task)
# 等待所有发布完成
results = await asyncio.gather(*tasks, return_exceptions=True)
# 统计结果
success_count = sum(1 for r in results if r is True)
fail_count = len(results) - success_count
cursor.close()
conn.close()
duration = (datetime.now() - start_time).total_seconds()
print("========== 定时发布任务完成 ==========", file=sys.stderr)
print(f"总计: {len(articles)} 篇, 成功: {success_count} 篇, 失败: {fail_count} 篇, 耗时: {duration:.1f}", file=sys.stderr)
except Exception as e:
print(f"[调度器] 定时任务异常: {str(e)}", file=sys.stderr)
import traceback
traceback.print_exc()
async def _publish_article_with_semaphore(self, article: Dict, user_id: int,
cursor, user_fail_count: Dict, paused_users: set):
"""带信号量控制的发布文章"""
async with self.semaphore:
try:
print(f"[调度器] 开始发布文案 {article['id']}: {article['title']}", file=sys.stderr)
success = await self._publish_single_article(article, cursor)
if not success:
user_fail_count[user_id] = user_fail_count.get(user_id, 0) + 1
if user_fail_count[user_id] >= self.max_failures_per_user_per_run:
paused_users.add(user_id)
print(f"用户 {user_id} 在本轮失败次数达到 {user_fail_count[user_id]} 次,暂停本轮后续发布", file=sys.stderr)
print(f"发布失败 [文案ID: {article['id']}, 标题: {article['title']}]", file=sys.stderr)
return False
else:
print(f"发布成功 [文案ID: {article['id']}, 标题: {article['title']}]", file=sys.stderr)
return True
except Exception as e:
print(f"发布异常 [文案ID: {article['id']}]: {str(e)}", file=sys.stderr)
return False
async def _publish_single_article(self, article: Dict, cursor) -> bool:
"""发布单篇文章"""
try:
# 1. 获取用户信息
user_id = article['publish_user_id'] or article['created_user_id']
cursor.execute("SELECT * FROM ai_users WHERE id = %s", (user_id,))
user = cursor.fetchone()
if not user:
self._update_article_status(cursor, article['id'], 'failed', '获取用户信息失败')
return False
# 2. 检查用户是否绑定小红书
if user['is_bound_xhs'] != 1:
self._update_article_status(cursor, article['id'], 'failed', '用户未绑定小红书账号')
return False
# 3. 获取author记录和Cookie
cursor.execute("""
SELECT * FROM ai_authors
WHERE phone = %s AND enterprise_id = %s AND channel = 1 AND status = 'active'
LIMIT 1
""", (user['phone'], user['enterprise_id']))
author = cursor.fetchone()
if not author or not author['xhs_cookie']:
self._update_article_status(cursor, article['id'], 'failed', '小红书Cookie已失效')
return False
# 4. 获取文章图片
cursor.execute("""
SELECT image_url FROM ai_article_images
WHERE article_id = %s
ORDER BY sort_order ASC
""", (article['id'],))
images = [img['image_url'] for img in cursor.fetchall() if img['image_url']]
# 5. 获取标签
cursor.execute("SELECT coze_tag FROM ai_article_tags WHERE article_id = %s LIMIT 1", (article['id'],))
tag_row = cursor.fetchone()
topics = []
if tag_row and tag_row['coze_tag']:
topics = self._parse_tags(tag_row['coze_tag'])
# 6. 解析Cookie并格式化
try:
# 数据库中存储的是完整的login_state JSON
login_state = json.loads(author['xhs_cookie'])
# 处理双重JSON编码的情况
if isinstance(login_state, str):
login_state = json.loads(login_state)
# 提取cookies字段兼容旧格式如果login_state本身就是cookies列表
if isinstance(login_state, dict) and 'cookies' in login_state:
# 新格式login_state对象包含cookies字段
cookies = login_state['cookies']
print(f" 从login_state提取cookies: {len(cookies) if isinstance(cookies, list) else 'unknown'}", file=sys.stderr)
elif isinstance(login_state, (list, dict)):
# 旧格式直接是cookies
cookies = login_state
print(f" 使用旧格式cookies无login_state包装", file=sys.stderr)
else:
raise ValueError(f"无法识别的Cookie存储格式: {type(login_state).__name__}")
# 验证cookies格式
if not isinstance(cookies, (list, dict)):
raise ValueError(f"Cookie必须是列表或字典格式当前类型: {type(cookies).__name__}")
# 格式化Cookie确保包含domain字段
cookies = self._format_cookies(cookies)
except Exception as e:
self._update_article_status(cursor, article['id'], 'failed', f'Cookie格式错误: {str(e)}')
return False
# 7. 从代理池获取代理(如果启用)
proxy = await self._fetch_proxy_from_pool()
if proxy:
print(f"[调度器] 使用代理: {proxy}", file=sys.stderr)
# 8. 生成随机User-Agent防指纹识别
user_agent = self._generate_random_user_agent() if self.enable_random_ua else None
if user_agent:
print(f"[调度器] 使用随机UA: {user_agent[:50]}...", file=sys.stderr)
# 9. 调用发布服务(增加超时控制)
try:
print(f"[调度器] 开始调用发布服务,超时设置: {self.publish_timeout}", file=sys.stderr)
result = await asyncio.wait_for(
self.login_service.publish_note(
title=article['title'],
content=article['content'],
images=images,
topics=topics,
cookies=cookies,
proxy=proxy,
user_agent=user_agent,
),
timeout=self.publish_timeout
)
except asyncio.TimeoutError:
error_msg = f'发布超时({self.publish_timeout}秒)'
print(f"[调度器] {error_msg}", file=sys.stderr)
self._update_article_status(cursor, article['id'], 'failed', error_msg)
return False
except Exception as e:
error_msg = f'调用发布服务异常: {str(e)}'
print(f"[调度器] {error_msg}", file=sys.stderr)
import traceback
traceback.print_exc()
self._update_article_status(cursor, article['id'], 'failed', error_msg)
return False
# 10. 更新状态
if result['success']:
self._update_article_status(cursor, article['id'], 'published', '发布成功')
return True
else:
error_msg = result.get('error', '未知错误')
self._update_article_status(cursor, article['id'], 'failed', error_msg)
return False
except Exception as e:
self._update_article_status(cursor, article['id'], 'failed', f'发布异常: {str(e)}')
return False
def _update_article_status(self, cursor, article_id: int, status: str, message: str = ''):
"""更新文章状态"""
try:
if status == 'published':
cursor.execute("""
UPDATE ai_articles
SET status = %s, publish_time = NOW(), updated_at = NOW()
WHERE id = %s
""", (status, article_id))
else:
cursor.execute("""
UPDATE ai_articles
SET status = %s, review_comment = %s, updated_at = NOW()
WHERE id = %s
""", (status, message, article_id))
cursor.connection.commit()
except Exception as e:
print(f"更新文章 {article_id} 状态失败: {str(e)}", file=sys.stderr)
def _limit_articles_per_user(self, articles: List[Dict], per_user_limit: int) -> List[Dict]:
"""限制每用户发文数"""
if per_user_limit <= 0:
return articles
grouped = {}
for art in articles:
user_id = art['publish_user_id'] or art['created_user_id']
if user_id not in grouped:
grouped[user_id] = []
grouped[user_id].append(art)
limited = []
for user_id, user_articles in grouped.items():
limited.extend(user_articles[:per_user_limit])
return limited
async def _filter_by_daily_and_hourly_limit(self, cursor, articles: List[Dict],
max_daily: int, max_hourly: int) -> List[Dict]:
"""按每日和每小时上限过滤文章"""
if max_daily <= 0 and max_hourly <= 0:
return articles
# 提取所有用户ID
user_ids = set()
for art in articles:
user_id = art['publish_user_id'] or art['created_user_id']
user_ids.add(user_id)
# 查询每用户已发布数量
user_daily_published = {}
user_hourly_published = {}
now = datetime.now()
today_start = now.replace(hour=0, minute=0, second=0, microsecond=0)
current_hour_start = now.replace(minute=0, second=0, microsecond=0)
for user_id in user_ids:
# 查询当日已发布数量
if max_daily > 0:
cursor.execute("""
SELECT COUNT(*) as count FROM ai_articles
WHERE status = 'published' AND publish_time >= %s
AND (publish_user_id = %s OR (publish_user_id IS NULL AND created_user_id = %s))
""", (today_start, user_id, user_id))
user_daily_published[user_id] = cursor.fetchone()['count']
# 查询当前小时已发布数量
if max_hourly > 0:
cursor.execute("""
SELECT COUNT(*) as count FROM ai_articles
WHERE status = 'published' AND publish_time >= %s
AND (publish_user_id = %s OR (publish_user_id IS NULL AND created_user_id = %s))
""", (current_hour_start, user_id, user_id))
user_hourly_published[user_id] = cursor.fetchone()['count']
# 过滤超限文章
filtered = []
for art in articles:
user_id = art['publish_user_id'] or art['created_user_id']
# 检查每日上限
if max_daily > 0 and user_daily_published.get(user_id, 0) >= max_daily:
continue
# 检查每小时上限
if max_hourly > 0 and user_hourly_published.get(user_id, 0) >= max_hourly:
continue
filtered.append(art)
return filtered
def _parse_tags(self, tag_str: str) -> List[str]:
"""解析标签字符串"""
if not tag_str:
return []
# 替换分隔符
tag_str = tag_str.replace(';', ',').replace(' ', ',').replace('', ',')
# 分割并清理
tags = []
for tag in tag_str.split(','):
tag = tag.strip()
if tag:
tags.append(tag)
return tags
def _format_cookies(self, cookies) -> List[Dict]:
"""
格式化Cookie只处理非标准格式的Cookie
对于Playwright原生格式的Cookie直接返回不做任何修改
Args:
cookies: Cookie数据支持list[dict]或dict格式
Returns:
格式化后的Cookie列表
"""
# 如果是字典格式(键值对),转换为列表格式
if isinstance(cookies, dict):
cookies = [
{
"name": name,
"value": str(value) if not isinstance(value, str) else value,
"domain": ".xiaohongshu.com",
"path": "/"
}
for name, value in cookies.items()
]
# 验证是否为列表
if not isinstance(cookies, list):
raise ValueError(f"Cookie必须是列表或字典格式当前类型: {type(cookies).__name__}")
# 检查是否为空列表
if not cookies or len(cookies) == 0:
print(f" Cookie列表为空直接返回", file=sys.stderr)
return cookies
# 检查是否是Playwright原生格式包含name和value字段
if isinstance(cookies[0], dict) and 'name' in cookies[0] and 'value' in cookies[0]:
# 已经是Playwright格式直接返回不做任何修改
print(f" 检测到Playwright原生格式直接使用 ({len(cookies)} 个cookie)", file=sys.stderr)
return cookies
# 其他格式,进行基础验证
formatted_cookies = []
for cookie in cookies:
if not isinstance(cookie, dict):
raise ValueError(f"Cookie元素必须是字典格式当前类型: {type(cookie).__name__}")
# 确保有基本字段
if 'domain' not in cookie and 'url' not in cookie:
cookie = cookie.copy()
cookie['domain'] = '.xiaohongshu.com'
if 'path' not in cookie and 'url' not in cookie:
if 'domain' in cookie or 'url' not in cookie:
cookie = cookie.copy() if cookie is cookies[cookies.index(cookie)] else cookie
cookie['path'] = '/'
formatted_cookies.append(cookie)
return formatted_cookies

View File

@@ -0,0 +1,55 @@
import requests
from damai_proxy_config import get_proxy_config
def test_single_proxy(index):
"""测试单个代理"""
try:
# 获取代理配置
proxy_info = get_proxy_config(index)
proxy_server = proxy_info['server'].replace('http://', '')
proxy_url = f"http://{proxy_info['username']}:{proxy_info['password']}@{proxy_server}"
proxies = {
'http': proxy_url,
'https': proxy_url
}
print(f'🔍 测试代理 {index + 1}: {proxy_info["server"]}')
# 测试连接
response = requests.get('http://httpbin.org/ip', proxies=proxies, timeout=10)
if response.status_code == 200:
print(f'✅ 代理 {index + 1} 连接成功! 状态码: {response.status_code}')
print(f'🌐 IP信息: {response.text}')
return True
else:
print(f'❌ 代理 {index + 1} 连接失败! 状态码: {response.status_code}')
return False
except requests.exceptions.ProxyError:
print(f'❌ 代理 {index + 1} 连接错误:无法连接到代理服务器')
return False
except requests.exceptions.ConnectTimeout:
print(f'❌ 代理 {index + 1} 连接超时')
return False
except Exception as e:
print(f'❌ 代理 {index + 1} 连接失败: {str(e)}')
return False
if __name__ == "__main__":
print("🚀 开始测试固定代理IP连接性\n")
# 测试两个代理
for i in range(2):
success = test_single_proxy(i)
if success:
print(f"✅ 代理 {i+1} 可用,适用于小红书登录发文\n")
else:
print(f"❌ 代理 {i+1} 不可用\n")
if i == 0: # 在测试第二个之前稍等一下
import time
time.sleep(2)
print("测试完成!")

View File

@@ -1,8 +1,9 @@
@echo off
echo 正在激活虚拟环境...
venv\Scripts\activate
call venv\Scripts\activate.bat
echo 正在启动小红书登录服务...
echo 正在启动小红书登录服务(开发环境)...
set "ENV=dev"
python main.py
pause

View File

@@ -1,7 +1,33 @@
#!/bin/bash
# 小红书Python服务启动脚本开发环境
# 用途:前台启动,方便查看日志
echo "正在激活虚拟环境..."
cd "$(dirname "$0")"
echo "========================================"
echo " 小红书登录服务(开发模式)"
echo "========================================"
echo ""
# 激活虚拟环境
echo "[环境] 激活虚拟环境: $(pwd)/venv"
source venv/bin/activate
if [ $? -ne 0 ]; then
echo "[错误] 虚拟环境激活失败"
exit 1
fi
echo "正在启动小红书登录服务..."
python main.py
# 显示Python版本和路径
echo "[Python] $(python --version)"
echo "[路径] $(which python)"
echo ""
echo "[启动] 正在启动Python服务端口8000..."
echo "[说明] 按Ctrl+C停止服务"
echo ""
# 设置环境为开发环境
export ENV=dev
# 启动服务开发模式不使用reload
python -m uvicorn main:app --host 0.0.0.0 --port 8000

9
backend/start_prod.bat Normal file
View File

@@ -0,0 +1,9 @@
@echo off
echo 正在激活虚拟环境...
call venv\Scripts\activate.bat
echo 正在启动小红书登录服务(生产环境)...
set "ENV=prod"
python main.py
pause

32
backend/start_prod.sh Normal file
View File

@@ -0,0 +1,32 @@
#!/bin/bash
# 小红书Python服务启动脚本生产环境
cd "$(dirname "$0")"
echo "========================================"
echo " 小红书登录服务(生产模式)"
echo "========================================"
echo ""
# 激活虚拟环境
echo "[环境] 激活虚拟环境: $(pwd)/venv"
source venv/bin/activate
if [ $? -ne 0 ]; then
echo "[错误] 虚拟环境激活失败"
exit 1
fi
# 显示Python版本和路径
echo "[Python] $(python --version)"
echo "[路径] $(which python)"
echo ""
echo "[启动] 正在启动Python服务生产环境端口8000..."
echo "[说明] 按Ctrl+C停止服务"
echo ""
# 设置环境为生产环境
export ENV=prod
# 启动服务(生产模式)
python -m uvicorn main:app --host 0.0.0.0 --port 8000

66
backend/start_service.bat Normal file
View File

@@ -0,0 +1,66 @@
@echo off
setlocal enabledelayedexpansion
chcp 65001 >nul
echo ====================================
echo 小红书登录服务(浏览器池模式)
echo ====================================
echo.
cd /d %~dp0
REM 检查虚拟环境
if not exist "venv\Scripts\python.exe" (
echo [错误] 未找到虚拟环境,请先运行: python -m venv venv
pause
exit /b 1
)
REM 检查并清理端口8000占用
echo [检查] 正在检查端口8000占用情况...
for /f "tokens=5" %%a in ('netstat -ano ^| findstr :8000 ^| findstr LISTENING') do (
echo [清理] 发现端口8000被进程%%a占用正在清理...
taskkill /F /PID %%a >nul 2>&1
if !errorlevel! equ 0 (
echo [成功] 已清理进程%%a
) else (
echo [警告] 无法清理进程%%a可能需要管理员权限
)
)
REM 等待端口释放
timeout /t 1 /nobreak >nul
echo.
echo [启动] 正在启动Python服务端口8000...
echo [模式] 浏览器池模式 - 性能优化
echo [说明] 浏览器实例将在30分钟无操作后自动清理
echo.
REM 激活虚拟环境并启动服务
echo [Environment] Using virtual environment: %~dp0venv
call "%~dp0venv\Scripts\activate.bat"
if !errorlevel! neq 0 (
echo [错误] 虚拟环境激活失败
pause
exit /b 1
)
REM 显示Python版本和路径
echo.
echo [Python Version]
python --version
echo [Python Path]
where python
echo.
REM 确认使用虚拟环境的Python
echo [Verify] Checking virtual environment...
python -c "import sys; print('Python executable:', sys.executable)"
echo.
REM 启动服务使用虚拟环境的uvicorn
echo [Service] Starting FastAPI service...
echo [Notice] Reload mode disabled for Windows compatibility
python -m uvicorn main:app --host 0.0.0.0 --port 8000
pause

50
backend/start_service.sh Normal file
View File

@@ -0,0 +1,50 @@
#!/bin/bash
echo "===================================="
echo " 小红书登录服务(浏览器池模式)"
echo "===================================="
echo ""
cd "$(dirname "$0")"
# 检查虚拟环境
if [ ! -f "venv/bin/python" ]; then
echo "[错误] 未找到虚拟环境,请先运行: python3 -m venv venv"
exit 1
fi
# 检查并清理端口8000占用
echo "[检查] 正在检查端口8000占用情况..."
PID=$(lsof -ti:8000)
if [ ! -z "$PID" ]; then
echo "[清理] 发现端口8000被进程$PID占用,正在清理..."
kill -9 $PID 2>/dev/null
if [ $? -eq 0 ]; then
echo "[成功] 已清理进程$PID"
else
echo "[警告] 无法清理进程$PID可能需要sudo权限"
fi
sleep 1
fi
echo ""
echo "[启动] 正在启动Python服务端口8000..."
echo "[模式] 浏览器池模式 - 性能优化"
echo "[说明] 浏览器实例将在30分钟无操作后自动清理"
echo ""
# 激活虚拟环境
echo "[环境] 激活虚拟环境: $(pwd)/venv"
source venv/bin/activate
if [ $? -ne 0 ]; then
echo "[错误] 虚拟环境激活失败"
exit 1
fi
# 显示Python版本和路径
echo "[Python] $(python --version)"
echo "[路径] $(which python)"
echo ""
# 启动服务使用虚拟环境的uvicorn
echo "[Notice] Reload mode disabled for Windows compatibility"
python -m uvicorn main:app --host 0.0.0.0 --port 8000

9
backend/stop.sh Normal file
View File

@@ -0,0 +1,9 @@
#!/bin/bash
# 小红书Python服务停止脚本
# 用途:停止生产环境服务
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
cd "$SCRIPT_DIR"
# 调用生产环境脚本的stop命令
"$SCRIPT_DIR/start_prod.sh" stop

View File

@@ -0,0 +1,137 @@
"""
小红书Storage State文件管理工具
用于管理和清理storage_state文件
"""
import os
import json
import time
from datetime import datetime, timedelta
from pathlib import Path
STORAGE_DIR = "storage_states"
def get_storage_files():
"""获取所有storage_state文件"""
if not os.path.exists(STORAGE_DIR):
return []
files = []
for filename in os.listdir(STORAGE_DIR):
if filename.endswith('.json'):
filepath = os.path.join(STORAGE_DIR, filename)
stat = os.stat(filepath)
files.append({
'filename': filename,
'filepath': filepath,
'size': stat.st_size,
'modified_time': stat.st_mtime,
'modified_date': datetime.fromtimestamp(stat.st_mtime)
})
return files
def cleanup_old_files(days=30):
"""清理超过指定天数未使用的文件"""
files = get_storage_files()
cutoff_time = time.time() - (days * 24 * 60 * 60)
deleted_count = 0
print(f"\n开始清理{days}天前的storage_state文件...")
for file_info in files:
if file_info['modified_time'] < cutoff_time:
try:
os.remove(file_info['filepath'])
print(f" 已删除: {file_info['filename']} (最后修改: {file_info['modified_date']})")
deleted_count += 1
except Exception as e:
print(f" 删除失败 {file_info['filename']}: {e}")
print(f"\n清理完成!共删除 {deleted_count} 个文件")
return deleted_count
def list_storage_files():
"""列出所有storage_state文件"""
files = get_storage_files()
if not files:
print("\n未找到任何storage_state文件")
return
print(f"\n找到 {len(files)} 个storage_state文件:\n")
print(f"{'文件名':<40} {'大小':<10} {'最后修改时间'}")
print("-" * 80)
for file_info in sorted(files, key=lambda x: x['modified_time'], reverse=True):
size_kb = file_info['size'] / 1024
print(f"{file_info['filename']:<40} {size_kb:>8.1f}KB {file_info['modified_date']}")
total_size = sum(f['size'] for f in files) / 1024 / 1024
print(f"\n总大小: {total_size:.2f} MB")
def validate_storage_file(phone):
"""验证指定手机号的storage_state文件是否有效"""
filepath = os.path.join(STORAGE_DIR, f"xhs_{phone}.json")
if not os.path.exists(filepath):
print(f"\n❌ 文件不存在: {filepath}")
return False
try:
with open(filepath, 'r', encoding='utf-8') as f:
data = json.load(f)
# 检查必要字段
if 'cookies' not in data:
print(f"\n❌ 文件格式错误: 缺少cookies字段")
return False
if 'origins' not in data:
print(f"\n⚠️ 文件格式不完整: 缺少origins字段")
cookie_count = len(data.get('cookies', []))
print(f"\n✅ 文件有效")
print(f" Cookie数量: {cookie_count}")
print(f" 文件大小: {os.path.getsize(filepath) / 1024:.1f}KB")
print(f" 最后修改: {datetime.fromtimestamp(os.path.getmtime(filepath))}")
return True
except json.JSONDecodeError:
print(f"\n❌ 文件格式错误: 不是有效的JSON")
return False
except Exception as e:
print(f"\n❌ 验证失败: {e}")
return False
if __name__ == "__main__":
import sys
if len(sys.argv) < 2:
print("用法:")
print(" python storage_state_manager.py list # 列出所有文件")
print(" python storage_state_manager.py cleanup [days] # 清理旧文件默认30天")
print(" python storage_state_manager.py validate <phone> # 验证指定手机号的文件")
sys.exit(1)
command = sys.argv[1]
if command == "list":
list_storage_files()
elif command == "cleanup":
days = int(sys.argv[2]) if len(sys.argv) > 2 else 30
cleanup_old_files(days)
elif command == "validate":
if len(sys.argv) < 3:
print("错误: 请提供手机号")
sys.exit(1)
phone = sys.argv[2]
validate_storage_file(phone)
else:
print(f"未知命令: {command}")
sys.exit(1)

33
backend/test.py Normal file
View File

@@ -0,0 +1,33 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
使用requests请求代理服务器
请求http和https网页均适用
"""
import requests
proxy_ip = "36.137.177.131:50001";
# 用户名密码认证(私密代理/独享代理)
username = "qqwvy0"
password = "mun3r7xz"
proxies = {
"http": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": proxy_ip},
"https": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": proxy_ip}
}
print(proxies)
# 要访问的目标网页
target_url = "https://creator.xiaohongshu.com/login";
# 使用代理IP发送请求
response = requests.get(target_url, proxies=proxies)
# 获取页面内容
if response.status_code == 200:
print(response.text)

View File

@@ -0,0 +1,170 @@
"""
基础浏览器测试脚本
用于测试浏览器是否能正常加载小红书页面
"""
import asyncio
from playwright.async_api import async_playwright
import sys
async def test_basic_browser(proxy_index: int = 0):
"""基础浏览器测试"""
print(f"\n{'='*60}")
print(f"🔍 基础浏览器测试")
print(f"{'='*60}")
# 从代理配置获取代理信息
from damai_proxy_config import get_proxy_config
proxy_config = get_proxy_config(proxy_index)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f"✅ 使用代理: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
try:
async with async_playwright() as p:
# 配置代理
proxy_parts = proxy_url.replace('http://', '').replace('https://', '').split('@')
if len(proxy_parts) == 2:
auth_part = proxy_parts[0]
server_part = proxy_parts[1]
username, password = auth_part.split(':')
proxy_config_obj = {
"server": f"http://{server_part}",
"username": username,
"password": password
}
else:
proxy_config_obj = {"server": proxy_url}
print(f" 配置的代理对象: {proxy_config_obj}")
# 启动浏览器
browser = await p.chromium.launch(
headless=False, # 非无头模式,便于观察
proxy=proxy_config_obj
)
# 创建上下文
context = await browser.new_context(
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
)
# 创建页面
page = await context.new_page()
print(f"\n🌐 尝试访问百度...")
try:
await page.goto('https://www.baidu.com', wait_until='networkidle', timeout=15000)
await asyncio.sleep(2)
title = await page.title()
url = page.url
content_len = len(await page.content())
print(f" ✅ 百度访问成功")
print(f" 标题: {title}")
print(f" URL: {url}")
print(f" 内容长度: {content_len} 字符")
except Exception as e:
print(f" ❌ 百度访问失败: {str(e)}")
print(f"\n🌐 尝试访问小红书登录页...")
try:
await page.goto('https://creator.xiaohongshu.com/login', wait_until='networkidle', timeout=15000)
await asyncio.sleep(5) # 等待更长时间
title = await page.title()
url = page.url
content = await page.content()
content_len = len(content)
print(f" 访问结果:")
print(f" 标题: {title}")
print(f" URL: {url}")
print(f" 内容长度: {content_len} 字符")
# 检查是否有特定内容
if content_len == 0:
print(f" ⚠️ 页面内容为空,可能存在加载问题")
elif "验证" in content or "captcha" in content.lower() or "安全" in content:
print(f" ⚠️ 检测到验证或安全提示")
else:
print(f" ✅ 页面加载正常")
# 查找页面上的所有元素
print(f"\n🔍 分析页面元素...")
# 查找所有input元素
inputs = await page.query_selector_all('input')
print(f" 找到 {len(inputs)} 个input元素")
# 查找所有表单相关元素
form_elements = await page.query_selector_all('input, button, select, textarea')
print(f" 找到 {len(form_elements)} 个表单相关元素")
# 打印前几个元素的信息
for i, elem in enumerate(form_elements[:5]):
try:
tag = await elem.evaluate('el => el.tagName')
text = await elem.inner_text()
placeholder = await elem.get_attribute('placeholder')
class_name = await elem.get_attribute('class')
id_attr = await elem.get_attribute('id')
print(f" 元素 {i+1}:")
print(f" - 标签: {tag}")
print(f" - 文本: {text[:50]}...")
print(f" - placeholder: {placeholder}")
print(f" - class: {class_name[:50]}...")
print(f" - id: {id_attr}")
except Exception as e:
print(f" 元素 {i+1}: 获取信息失败 - {str(e)}")
except Exception as e:
print(f" ❌ 小红书访问失败: {str(e)}")
import traceback
traceback.print_exc()
print(f"\n⏸️ 浏览器保持打开状态,您可以手动检查页面")
print(f" 按 Enter 键关闭浏览器...")
# 等待用户输入
input()
await browser.close()
print(f"✅ 浏览器已关闭")
except Exception as e:
print(f"❌ 测试过程异常: {str(e)}")
import traceback
traceback.print_exc()
async def main():
"""主函数"""
print("="*60)
print("🔍 基础浏览器测试工具")
print("="*60)
proxy_choice = input("\n请选择代理 (0 或 1, 默认为0): ").strip()
if proxy_choice not in ['0', '1']:
proxy_choice = '0'
proxy_idx = int(proxy_choice)
await test_basic_browser(proxy_idx)
print(f"\n{'='*60}")
print("✅ 测试完成!")
print("="*60)
if __name__ == "__main__":
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
# 运行测试
asyncio.run(main())

View File

@@ -0,0 +1,213 @@
"""
测试修复后的浏览器池
验证预热超时问题是否已解决
"""
import asyncio
import sys
from xhs_login import XHSLoginService
async def test_browser_pool_with_proxy(proxy_index: int = 0):
"""测试修复后的浏览器池"""
print(f"\n{'='*60}")
print(f"🔧 测试修复后的浏览器池")
print(f"{'='*60}")
# 从代理配置获取代理信息
from damai_proxy_config import get_proxy_config
proxy_config = get_proxy_config(proxy_index)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f"✅ 使用代理: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
print(f" 代理URL: {proxy_url}")
# 创建登录服务(使用浏览器池)
login_service = XHSLoginService(use_pool=True) # 使用浏览器池
try:
print(f"\n🚀 初始化浏览器(使用代理 + 浏览器池)...")
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
await login_service.init_browser(proxy=proxy_url, user_agent=user_agent)
print("✅ 浏览器初始化成功")
# 检查浏览器池状态
browser_pool = login_service.browser_pool
if browser_pool:
stats = browser_pool.get_stats()
print(f"\n📊 浏览器池状态:")
print(f" 主浏览器存活: {stats['browser_alive']}")
print(f" 上下文存活: {stats['context_alive']}")
print(f" 页面存活: {stats['page_alive']}")
print(f" 是否预热: {stats['is_preheated']}")
print(f" 临时浏览器数: {stats['temp_browsers_count']}")
# 访问小红书登录页面
print(f"\n🌐 访问小红书创作者平台...")
await login_service.page.goto('https://creator.xiaohongshu.com/login', wait_until='domcontentloaded', timeout=30000)
await asyncio.sleep(2)
title = await login_service.page.title()
url = login_service.page.url
content_len = len(await login_service.page.content())
print(f"✅ 访问成功")
print(f" 标题: {title}")
print(f" URL: {url}")
print(f" 内容长度: {content_len} 字符")
# 检查关键元素
phone_input = await login_service.page.query_selector('input[placeholder="手机号"]')
if phone_input:
print(f"✅ 找到手机号输入框")
else:
print(f"❌ 未找到手机号输入框")
# 查找所有input元素
inputs = await login_service.page.query_selector_all('input')
print(f" 共找到 {len(inputs)} 个input元素")
if content_len == 0:
print(f"⚠️ 页面内容为空")
else:
print(f"✅ 页面内容正常加载")
return True
except Exception as e:
print(f"❌ 测试失败: {str(e)}")
import traceback
traceback.print_exc()
return False
finally:
await login_service.close_browser()
async def test_multiple_requests(proxy_index: int = 0):
"""测试多个请求复用浏览器池"""
print(f"\n{'='*60}")
print(f"🔄 测试浏览器池复用")
print(f"{'='*60}")
from damai_proxy_config import get_proxy_config
proxy_config = get_proxy_config(proxy_index)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f"✅ 使用代理: 代理{proxy_index + 1}")
success_count = 0
for i in range(3):
print(f"\n🧪 请求 {i+1}/3")
login_service = XHSLoginService(use_pool=True)
try:
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
await login_service.init_browser(proxy=proxy_url, user_agent=user_agent)
# 访问页面
await login_service.page.goto('https://creator.xiaohongshu.com/login', wait_until='domcontentloaded', timeout=30000)
await asyncio.sleep(1)
content_len = len(await login_service.page.content())
if content_len > 0:
print(f" ✅ 请求 {i+1} 成功,内容长度: {content_len}")
success_count += 1
else:
print(f" ❌ 请求 {i+1} 失败,内容为空")
except Exception as e:
print(f" ❌ 请求 {i+1} 异常: {str(e)}")
finally:
await login_service.close_browser()
# 等待一下避免请求过于频繁
if i < 2:
await asyncio.sleep(1)
print(f"\n📈 测试结果: {success_count}/3 请求成功")
return success_count == 3
def explain_fix():
"""解释修复内容"""
print("="*60)
print("🔧 修复内容说明")
print("="*60)
print("\n修复的两个问题:")
print("1. 增加超时时间: 从30秒增加到45秒")
print("2. 修改等待策略: 从'networkidle'改为'domcontentloaded'")
print(" - 'networkidle': 等待网络空闲(可能等待时间过长)")
print(" - 'domcontentloaded': DOM内容加载完成更快更稳定")
print("\n浏览器池优化效果:")
print("✅ 减少预热超时错误")
print("✅ 提高页面加载成功率")
print("✅ 保持浏览器常驻,提升性能")
async def main():
"""主函数"""
explain_fix()
print(f"\n{'='*60}")
print("🎯 选择测试模式")
print("="*60)
print("\n1. 单次浏览器池测试")
print("2. 多请求复用测试")
print("3. 全部测试")
try:
choice = input("\n请选择测试模式 (1-3, 默认为3): ").strip()
if choice not in ['1', '2', '3']:
choice = '3'
proxy_choice = input("请选择代理 (0 或 1, 默认为0): ").strip()
if proxy_choice not in ['0', '1']:
proxy_choice = '0'
proxy_idx = int(proxy_choice)
if choice in ['1', '3']:
print(f"\n{'-'*40}")
print("测试1: 单次浏览器池测试")
success1 = await test_browser_pool_with_proxy(proxy_idx)
if choice in ['2', '3']:
print(f"\n{'-'*40}")
print("测试2: 多请求复用测试")
success2 = await test_multiple_requests(proxy_idx)
if choice == '3':
overall_success = success1 and success2
elif choice == '1':
overall_success = success1
else:
overall_success = success2
print(f"\n{'='*60}")
if overall_success:
print("✅ 所有测试通过!浏览器池预热问题已修复")
else:
print("❌ 部分测试失败,请检查配置")
print("="*60)
except KeyboardInterrupt:
print("\n\n⚠️ 测试被用户中断")
except Exception as e:
print(f"\n❌ 测试过程中出现错误: {str(e)}")
import traceback
traceback.print_exc()
if __name__ == "__main__":
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
# 运行测试
asyncio.run(main())

View File

@@ -0,0 +1,313 @@
"""
测试Cookie格式处理修复
验证scheduler.py中的_format_cookies方法能正确处理各种Cookie格式
"""
import json
from typing import List, Dict
def _format_cookies(cookies) -> List[Dict]:
"""
格式化Cookie只处理非标准格式的Cookie
对于Playwright原生格式的Cookie直接返回不做任何修改
这是scheduler.py中_format_cookies方法的副本用于独立测试
Args:
cookies: Cookie数据支持list[dict]或dict格式
Returns:
格式化后的Cookie列表
"""
# 如果是字典格式(键值对),转换为列表格式
if isinstance(cookies, dict):
cookies = [
{
"name": name,
"value": str(value) if not isinstance(value, str) else value,
"domain": ".xiaohongshu.com",
"path": "/"
}
for name, value in cookies.items()
]
# 验证是否为列表
if not isinstance(cookies, list):
raise ValueError(f"Cookie必须是列表或字典格式当前类型: {type(cookies).__name__}")
# 检查是否是Playwright原生格式包含name和value字段
if cookies and isinstance(cookies[0], dict) and 'name' in cookies[0] and 'value' in cookies[0]:
# 已经是Playwright格式直接返回不做任何修改
return cookies
# 其他格式,进行基础验证
formatted_cookies = []
for cookie in cookies:
if not isinstance(cookie, dict):
raise ValueError(f"Cookie元素必须是字典格式当前类型: {type(cookie).__name__}")
# 确保有基本字段
if 'domain' not in cookie and 'url' not in cookie:
cookie = cookie.copy()
cookie['domain'] = '.xiaohongshu.com'
if 'path' not in cookie and 'url' not in cookie:
if 'domain' in cookie or 'url' not in cookie:
cookie = cookie.copy() if cookie is cookies[cookies.index(cookie)] else cookie
cookie['path'] = '/'
formatted_cookies.append(cookie)
return formatted_cookies
def test_format_cookies():
"""测试_format_cookies方法"""
print("="*60)
print("测试 Cookie 格式处理")
print("="*60)
# 测试1: 字典格式(键值对)
print("\n测试 1: 字典格式(键值对)")
cookies_dict = {
"a1": "xxx",
"webId": "yyy",
"web_session": "zzz"
}
try:
result = _format_cookies(cookies_dict)
print(f"✅ 成功处理字典格式")
print(f" 输入: {type(cookies_dict).__name__} with {len(cookies_dict)} items")
print(f" 输出: {type(result).__name__} with {len(result)} items")
print(f" 第一个Cookie: {result[0]}")
assert isinstance(result, list)
assert len(result) == 3
assert all('name' in c and 'value' in c and 'domain' in c for c in result)
except Exception as e:
print(f"❌ 失败: {str(e)}")
# 测试2: 列表格式(完整格式已有domain和path)
print("\n测试 2: 列表格式(完整格式)")
cookies_list_full = [
{
"name": "a1",
"value": "xxx",
"domain": ".xiaohongshu.com",
"path": "/",
"expires": -1,
"httpOnly": False,
"secure": False,
"sameSite": "Lax"
}
]
try:
result = _format_cookies(cookies_list_full)
print(f"✅ 成功处理完整列表格式")
print(f" 输入: {type(cookies_list_full).__name__} with {len(cookies_list_full)} items")
print(f" 输出: {type(result).__name__} with {len(result)} items")
# 验证Playwright原生格式被完整保留
print(f" 保留的字段: {list(result[0].keys())}")
assert result == cookies_list_full, "Playwright原生格式应该被完整保留不做任何修改"
assert 'expires' in result[0], "expires字段应该被保留"
assert result[0]['expires'] == -1, "expires=-1应该被保留"
assert isinstance(result, list)
assert len(result) == 1
except Exception as e:
print(f"❌ 失败: {str(e)}")
# 测试3: 非Playwright格式缺少name字段需要补充domain和path
print("\n测试 3: 非Playwright格式缺少字段需要补充")
cookies_list_partial = [
{
"cookie_name": "a1", # 没有name字段不是Playwright格式
"cookie_value": "xxx"
}
]
try:
result = _format_cookies(cookies_list_partial)
print(f"✅ 成功处理非Playwright格式")
print(f" 输入: {type(cookies_list_partial).__name__} with {len(cookies_list_partial)} items")
print(f" 输出: {type(result).__name__} with {len(result)} items")
print(f" 自动添加的字段: domain={result[0].get('domain')}, path={result[0].get('path')}")
assert isinstance(result, list)
# 应该自动添加domain和path
assert result[0]['domain'] == '.xiaohongshu.com'
assert result[0]['path'] == '/'
except Exception as e:
print(f"❌ 失败: {str(e)}")
# 测试4: 双重JSON编码(模拟数据库存储场景)
print("\n测试 4: 双重JSON编码字符串")
cookies_dict = {"a1": "xxx", "webId": "yyy"}
# 第一次JSON编码
cookies_json_1 = json.dumps(cookies_dict)
# 第二次JSON编码
cookies_json_2 = json.dumps(cookies_json_1)
print(f" 原始字典: {cookies_dict}")
print(f" 第一次编码: {cookies_json_1}")
print(f" 第二次编码: {cookies_json_2}")
# 模拟从数据库读取并解析
try:
# 第一次解析
cookies_parsed_1 = json.loads(cookies_json_2)
print(f" 第一次解析后类型: {type(cookies_parsed_1).__name__}")
# 处理双重编码
if isinstance(cookies_parsed_1, str):
cookies_parsed_2 = json.loads(cookies_parsed_1)
print(f" 第二次解析后类型: {type(cookies_parsed_2).__name__}")
cookies = cookies_parsed_2
else:
cookies = cookies_parsed_1
# 格式化
result = _format_cookies(cookies)
print(f"✅ 成功处理双重JSON编码")
print(f" 最终输出: {type(result).__name__} with {len(result)} items")
assert isinstance(result, list)
except Exception as e:
print(f"❌ 失败: {str(e)}")
# 测试5: 错误格式 - 字符串(不是JSON)
print("\n测试 5: 错误格式 - 普通字符串")
try:
result = _format_cookies("invalid_string")
print(f"❌ 应该抛出异常但没有")
except ValueError as e:
print(f"✅ 正确抛出ValueError异常")
print(f" 错误信息: {str(e)}")
except Exception as e:
print(f"❌ 抛出了非预期的异常: {str(e)}")
# 测试6: 错误格式 - 列表中包含非字典元素
print("\n测试 6: 错误格式 - 列表中包含非字典元素")
try:
result = _format_cookies(["string_item", 123])
print(f"❌ 应该抛出异常但没有")
except ValueError as e:
print(f"✅ 正确抛出ValueError异常")
print(f" 错误信息: {str(e)}")
except Exception as e:
print(f"❌ 抛出了非预期的异常: {str(e)}")
# 测试7: Playwright原生格式中value为对象保持原样
print("\n测试 7: Playwright原生格式中value为对象应保持原样")
cookies_with_object_value = [
{
"name": "test_cookie",
"value": {"nested": "object"}, # value是对象
"domain": ".xiaohongshu.com",
"path": "/"
}
]
try:
result = _format_cookies(cookies_with_object_value)
print(f"✅ Playwright原生格式被完整保留")
print(f" 输入value类型: {type(cookies_with_object_value[0]['value']).__name__}")
print(f" 输出value类型: {type(result[0]['value']).__name__}")
print(f" 输出value内容: {result[0]['value']}")
# Playwright原生格式不做任何修改包括uvalue
assert result == cookies_with_object_value, "Playwright原生格式应完整保留"
except Exception as e:
print(f"❌ 失败: {str(e)}")
# 测试8: 字典格式中value为数字
print("\n测试 8: 字典格式中value为数字应自动转换为字符串")
cookies_dict_with_number = {
"a1": "xxx",
"user_id": 12345, # value是数字
"is_login": True # value是布尔值
}
try:
result = _format_cookies(cookies_dict_with_number)
print(f"✅ 成功处理数字/布尔value")
print(f" 输入: {cookies_dict_with_number}")
print(f" user_id value类型: {type(result[1]['value']).__name__}, 值: {result[1]['value']}")
print(f" is_login value类型: {type(result[2]['value']).__name__}, 值: {result[2]['value']}")
# 验证不再包含expires等字段
print(f" 字段: {list(result[0].keys())}")
assert all(isinstance(c['value'], str) for c in result), "所有value应该都是字符串类型"
assert 'expires' not in result[0], "不应该包含expires字段"
except Exception as e:
print(f"❌ 失败: {str(e)}")
# 测试9: Playwright原生格式中expires=-1应被保留
print("\n测试 9: Playwright原生格式中expires=-1应被保留")
cookies_with_invalid_expires = [
{
"name": "test_cookie",
"value": "test_value",
"domain": ".xiaohongshu.com",
"path": "/",
"expires": -1 # Playwright原生格式
}
]
try:
result = _format_cookies(cookies_with_invalid_expires)
print(f"✅ Playwright原生格式被完整保留")
print(f" 原始字段: {list(cookies_with_invalid_expires[0].keys())}")
print(f" 处理后字段: {list(result[0].keys())}")
assert result == cookies_with_invalid_expires, "Playwright原生格式应被完整保留"
assert 'expires' in result[0] and result[0]['expires'] == -1, "expires=-1应该被保留"
except Exception as e:
print(f"❌ 失败: {str(e)}")
# 测试10: Playwright原生格式中expires为浮点数应被保留
print("\n测试 10: Playwright原生格式中expires为浮点数应被保留")
cookies_with_float_expires = [
{
"name": "test_cookie",
"value": "test_value",
"domain": ".xiaohongshu.com",
"path": "/",
"expires": 1797066497.112584 # Playwright原生格式常常有浮点数
}
]
try:
result = _format_cookies(cookies_with_float_expires)
print(f"✅ Playwright原生格式被完整保留")
print(f" 原始expires: {cookies_with_float_expires[0]['expires']} (类型: {type(cookies_with_float_expires[0]['expires']).__name__})")
print(f" 处理后expires: {result[0]['expires']} (类型: {type(result[0]['expires']).__name__})")
assert result == cookies_with_float_expires, "Playwright原生格式应被完整保留"
assert isinstance(result[0]['expires'], float), "expires浮点数应该被保留"
except Exception as e:
print(f"❌ 失败: {str(e)}")
# 测试11: Playwright原生格式中sameSite大小写应被保留
print("\n测试 11: Playwright原生格式中sameSite应被完整保留")
cookies_with_samesite = [
{
"name": "test_cookie1",
"value": "test_value1",
"domain": ".xiaohongshu.com",
"path": "/",
"sameSite": "Lax" # Playwright原生格式
},
{
"name": "test_cookie2",
"value": "test_value2",
"domain": ".xiaohongshu.com",
"path": "/",
"sameSite": "Strict"
}
]
try:
result = _format_cookies(cookies_with_samesite)
print(f"✅ Playwright原生格式被完整保留")
print(f" cookie1 sameSite: {result[0]['sameSite']}")
print(f" cookie2 sameSite: {result[1]['sameSite']}")
assert result == cookies_with_samesite, "Playwright原生格式应被完整保留"
assert result[0]['sameSite'] == 'Lax'
assert result[1]['sameSite'] == 'Strict'
except Exception as e:
print(f"❌ 失败: {str(e)}")
print("\n" + "="*60)
print("测试完成")
print("="*60)
if __name__ == "__main__":
test_format_cookies()

View File

@@ -0,0 +1,31 @@
@echo off
chcp 65001 >nul
echo ========================================
echo 小红书Cookie注入测试工具
echo ========================================
echo.
echo 此工具使用Playwright真实注入Cookie
echo 支持验证Cookie有效性并跳转到指定页面
echo.
echo ========================================
echo.
cd /d %~dp0
REM 检查是否有cookies.json文件
if exist cookies.json (
echo 检测到 cookies.json 文件
echo.
python test_cookie_inject.py
) else (
echo 未找到 cookies.json 文件
echo 请先准备Cookie文件或在程序中手动输入
echo.
python test_cookie_inject.py
)
echo.
echo ========================================
echo 测试完成
echo ========================================
pause

View File

@@ -0,0 +1,398 @@
"""
Cookie注入测试脚本
使用Playwright注入Cookie并验证其有效性
支持跳转到创作者中心或小红书首页
"""
import asyncio
import sys
import json
from pathlib import Path
from playwright.async_api import async_playwright
from typing import Optional, List, Dict, Any
class CookieInjector:
"""Cookie注入器"""
def __init__(self, headless: bool = False):
"""
初始化Cookie注入器
Args:
headless: 是否使用无头模式False可以看到浏览器界面
"""
self.headless = headless
self.playwright = None
self.browser = None
self.context = None
self.page = None
async def init_browser(self):
"""初始化浏览器"""
try:
print("正在启动浏览器...")
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
try:
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
except Exception as e:
print(f"警告: 设置事件循环策略失败: {str(e)}")
self.playwright = await async_playwright().start()
# 启动浏览器
self.browser = await self.playwright.chromium.launch(
headless=self.headless,
args=['--disable-blink-features=AutomationControlled']
)
# 创建浏览器上下文
self.context = await self.browser.new_context(
viewport={'width': 1280, 'height': 720},
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
)
# 创建新页面
self.page = await self.context.new_page()
print("浏览器初始化成功")
except Exception as e:
print(f"浏览器初始化失败: {str(e)}")
raise
async def inject_cookies(self, cookies: List[Dict[str, Any]]) -> bool:
"""
注入Cookie
Args:
cookies: Cookie列表
Returns:
是否注入成功
"""
try:
if not self.context:
await self.init_browser()
print(f"正在注入 {len(cookies)} 个Cookie...")
# 注入Cookie到浏览器上下文
await self.context.add_cookies(cookies)
print("Cookie注入成功")
return True
except Exception as e:
print(f"Cookie注入失败: {str(e)}")
return False
async def verify_and_navigate(self, target_page: str = 'creator') -> Dict[str, Any]:
"""
验证Cookie并跳转到指定页面
Args:
target_page: 目标页面类型 ('creator''home')
Returns:
验证结果字典
"""
try:
if not self.page:
return {"success": False, "error": "浏览器未初始化"}
# 确定目标URL
urls = {
'creator': 'https://creator.xiaohongshu.com',
'home': 'https://www.xiaohongshu.com'
}
target_url = urls.get(target_page, urls['creator'])
page_name = '创作者中心' if target_page == 'creator' else '小红书首页'
print(f"\n正在访问{page_name}: {target_url}")
# 访问目标页面
await self.page.goto(target_url, wait_until='networkidle', timeout=30000)
await asyncio.sleep(2) # 等待页面完全加载
# 获取当前URL和标题
current_url = self.page.url
title = await self.page.title()
print(f"当前URL: {current_url}")
print(f"页面标题: {title}")
# 检查是否被重定向到登录页
is_logged_in = 'login' not in current_url.lower()
if is_logged_in:
print("Cookie验证成功已登录状态")
# 尝试获取用户信息
try:
# 等待用户相关元素出现(如头像、用户名等)
await self.page.wait_for_selector('[class*="avatar"], [class*="user"]', timeout=5000)
print("检测到用户信息元素,确认登录成功")
except Exception:
print("未检测到明显的用户信息元素,但未跳转到登录页")
return {
"success": True,
"message": f"Cookie有效已成功访问{page_name}",
"url": current_url,
"title": title,
"logged_in": True
}
else:
print("Cookie可能已失效页面跳转到登录页")
return {
"success": False,
"error": "Cookie已失效或无效页面跳转到登录页",
"url": current_url,
"title": title,
"logged_in": False
}
except Exception as e:
print(f"验证过程异常: {str(e)}")
import traceback
traceback.print_exc()
return {
"success": False,
"error": f"验证过程异常: {str(e)}"
}
async def keep_browser_open(self, duration: int = 60):
"""
保持浏览器打开一段时间,方便观察
Args:
duration: 保持打开的秒数0表示永久打开直到手动关闭
"""
try:
if duration == 0:
print("\n浏览器将保持打开,按 Ctrl+C 关闭...")
try:
while True:
await asyncio.sleep(1)
except KeyboardInterrupt:
print("\n用户中断,准备关闭浏览器...")
else:
print(f"\n浏览器将保持打开 {duration} 秒...")
await asyncio.sleep(duration)
print("时间到,准备关闭浏览器...")
except Exception as e:
print(f"保持浏览器异常: {str(e)}")
async def close_browser(self):
"""关闭浏览器"""
try:
print("\n正在关闭浏览器...")
if self.page:
await self.page.close()
if self.context:
await self.context.close()
if self.browser:
await self.browser.close()
if self.playwright:
await self.playwright.stop()
print("浏览器已关闭")
except Exception as e:
print(f"关闭浏览器异常: {str(e)}")
def load_cookies_from_file(file_path: str) -> Optional[List[Dict[str, Any]]]:
"""
从文件加载Cookie
Args:
file_path: Cookie文件路径
Returns:
Cookie列表失败返回None
"""
try:
cookie_file = Path(file_path)
if not cookie_file.exists():
print(f"Cookie文件不存在: {file_path}")
return None
with open(cookie_file, 'r', encoding='utf-8') as f:
cookies = json.load(f)
if not isinstance(cookies, list):
print("Cookie格式错误必须是数组")
return None
if len(cookies) == 0:
print("Cookie数组为空")
return None
# 验证每个Cookie必须有name和value
for cookie in cookies:
if not cookie.get('name') or not cookie.get('value'):
print(f"Cookie格式错误缺少name或value字段")
return None
print(f"成功加载 {len(cookies)} 个Cookie")
return cookies
except json.JSONDecodeError as e:
print(f"Cookie文件JSON解析失败: {str(e)}")
return None
except Exception as e:
print(f"加载Cookie文件失败: {str(e)}")
return None
async def test_cookie_inject(
cookies_source: str,
target_page: str = 'creator',
headless: bool = False,
keep_open: int = 0
):
"""
测试Cookie注入
Args:
cookies_source: Cookie来源文件路径或JSON字符串
target_page: 目标页面 ('creator''home')
headless: 是否使用无头模式
keep_open: 保持浏览器打开的秒数0表示永久打开
"""
print("="*60)
print("Cookie注入并验证测试")
print("="*60)
# 加载Cookie
cookies = None
# 尝试作为文件路径加载
if Path(cookies_source).exists():
print(f"\n从文件加载Cookie: {cookies_source}")
cookies = load_cookies_from_file(cookies_source)
else:
# 尝试作为JSON字符串解析
try:
print("\n尝试解析Cookie JSON字符串...")
cookies = json.loads(cookies_source)
if isinstance(cookies, list) and len(cookies) > 0:
print(f"成功解析 {len(cookies)} 个Cookie")
except Exception as e:
print(f"Cookie解析失败: {str(e)}")
if not cookies:
print("\n加载Cookie失败请检查输入")
return
# 创建注入器
injector = CookieInjector(headless=headless)
try:
# 初始化浏览器
await injector.init_browser()
# 注入Cookie
inject_success = await injector.inject_cookies(cookies)
if not inject_success:
print("\nCookie注入失败")
return
# 验证并跳转
result = await injector.verify_and_navigate(target_page)
print("\n" + "="*60)
print("验证结果")
print("="*60)
if result.get('success'):
print(f"状态: 成功")
print(f"消息: {result.get('message')}")
print(f"URL: {result.get('url')}")
print(f"标题: {result.get('title')}")
print(f"登录状态: {'已登录' if result.get('logged_in') else '未登录'}")
else:
print(f"状态: 失败")
print(f"错误: {result.get('error')}")
if result.get('url'):
print(f"当前URL: {result.get('url')}")
# 保持浏览器打开
if keep_open >= 0:
await injector.keep_browser_open(keep_open)
except KeyboardInterrupt:
print("\n\n用户中断测试")
except Exception as e:
print(f"\n测试过程异常: {str(e)}")
import traceback
traceback.print_exc()
finally:
await injector.close_browser()
print("\n" + "="*60)
print("测试完成")
print("="*60)
async def main():
"""主函数"""
print("="*60)
print("小红书Cookie注入测试工具")
print("="*60)
print("\n功能说明:")
print("1. 注入Cookie到浏览器")
print("2. 验证Cookie有效性")
print("3. 跳转到指定页面(创作者中心/小红书首页)")
print("\n" + "="*60)
# 输入Cookie来源
print("\n请输入Cookie来源")
print("1. 输入Cookie文件路径如: cookies.json")
print("2. 直接粘贴JSON格式的Cookie")
cookies_source = input("\nCookie来源: ").strip()
if not cookies_source:
print("Cookie来源不能为空")
return
# 选择目标页面
print("\n请选择目标页面:")
print("1. 创作者中心creator.xiaohongshu.com")
print("2. 小红书首页www.xiaohongshu.com")
page_choice = input("\n选择 (1 或 2, 默认为 1): ").strip()
target_page = 'home' if page_choice == '2' else 'creator'
# 选择浏览器模式
headless_choice = input("\n是否使用无头模式?(y/n, 默认为 n): ").strip().lower()
headless = headless_choice == 'y'
# 选择保持打开时间
keep_open_input = input("\n保持浏览器打开时间0表示直到手动关闭默认60: ").strip()
try:
keep_open = int(keep_open_input) if keep_open_input else 60
except ValueError:
keep_open = 60
# 执行测试
await test_cookie_inject(
cookies_source=cookies_source,
target_page=target_page,
headless=headless,
keep_open=keep_open
)
if __name__ == "__main__":
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
# 运行测试
asyncio.run(main())

207
backend/test_damai_proxy.py Normal file
View File

@@ -0,0 +1,207 @@
"""
大麦固定代理IP测试脚本
测试两个固定代理IP在无头浏览器中的可用性
"""
import asyncio
import sys
from playwright.async_api import async_playwright
# 大麦固定代理IP配置
DAMAI_PROXIES = [
{
"name": "大麦代理1",
"server": "http://36.137.177.131:50001",
"username": "qqwvy0",
"password": "mun3r7xz"
},
{
"name": "大麦代理2",
"server": "http://111.132.40.72:50002",
"username": "ih3z07",
"password": "078bt7o5"
}
]
async def test_proxy(proxy_config: dict):
"""
测试单个代理IP
Args:
proxy_config: 代理配置字典
"""
print(f"\n{'='*60}")
print(f"🔍 开始测试: {proxy_config['name']}")
print(f" 代理服务器: {proxy_config['server']}")
print(f" 认证信息: {proxy_config['username']} / {proxy_config['password']}")
print(f"{'='*60}")
playwright = None
browser = None
try:
# 启动Playwright
playwright = await async_playwright().start()
print("✅ Playwright启动成功")
# 配置代理
proxy_settings = {
"server": proxy_config["server"],
"username": proxy_config["username"],
"password": proxy_config["password"]
}
# 启动浏览器(带代理)
print(f"🚀 正在启动浏览器(使用代理: {proxy_config['server']}...")
browser = await playwright.chromium.launch(
headless=True,
proxy=proxy_settings,
args=[
'--disable-blink-features=AutomationControlled',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-web-security',
'--disable-features=IsolateOrigins,site-per-process',
]
)
print("✅ 浏览器启动成功")
# 创建上下文
context = await browser.new_context(
viewport={'width': 1280, 'height': 720},
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
)
print("✅ 浏览器上下文创建成功")
# 创建页面
page = await context.new_page()
print("✅ 页面创建成功")
# 测试1: 访问IP检测网站检查代理IP是否生效
print("\n📍 测试1: 访问IP检测网站...")
try:
await page.goto("http://httpbin.org/ip", timeout=30000)
await asyncio.sleep(2)
# 获取页面内容
content = await page.content()
print("✅ 访问成功,页面内容:")
print(content[:500]) # 只显示前500字符
# 尝试提取IP信息
ip_info = await page.evaluate("() => document.body.innerText")
print(f"\n🌐 当前IP信息:\n{ip_info}")
except Exception as e:
print(f"❌ 测试1失败: {str(e)}")
# 测试2: 访问小红书登录页(检查代理在实际场景中是否可用)
print("\n📍 测试2: 访问小红书登录页...")
try:
await page.goto("https://creator.xiaohongshu.com/login", timeout=30000)
await asyncio.sleep(3)
title = await page.title()
url = page.url
print(f"✅ 访问成功")
print(f" 页面标题: {title}")
print(f" 当前URL: {url}")
except Exception as e:
print(f"❌ 测试2失败: {str(e)}")
# 测试3: 访问大麦网(测试目标网站)
print("\n📍 测试3: 访问大麦网...")
try:
await page.goto("https://www.damai.cn/", timeout=30000)
await asyncio.sleep(3)
title = await page.title()
url = page.url
print(f"✅ 访问成功")
print(f" 页面标题: {title}")
print(f" 当前URL: {url}")
except Exception as e:
print(f"❌ 测试3失败: {str(e)}")
print(f"\n{proxy_config['name']} 测试完成")
except Exception as e:
print(f"\n{proxy_config['name']} 测试失败: {str(e)}")
import traceback
traceback.print_exc()
finally:
# 清理资源
try:
if browser:
await browser.close()
print("🧹 浏览器已关闭")
if playwright:
await playwright.stop()
print("🧹 Playwright已停止")
except Exception as e:
print(f"⚠️ 清理资源时出错: {str(e)}")
async def test_all_proxies():
"""测试所有代理IP"""
print("\n" + "="*60)
print("🎯 大麦固定代理IP测试")
print("="*60)
print(f"📊 共配置 {len(DAMAI_PROXIES)} 个代理IP")
# 依次测试每个代理
for i, proxy_config in enumerate(DAMAI_PROXIES, 1):
print(f"\n\n{'#'*60}")
print(f"# 测试进度: {i}/{len(DAMAI_PROXIES)}")
print(f"{'#'*60}")
await test_proxy(proxy_config)
# 测试间隔
if i < len(DAMAI_PROXIES):
print(f"\n⏳ 等待5秒后测试下一个代理...")
await asyncio.sleep(5)
print("\n" + "="*60)
print("🎉 所有代理测试完成!")
print("="*60)
async def test_single_proxy(index: int = 0):
"""
测试单个代理IP
Args:
index: 代理索引0或1
"""
if index < 0 or index >= len(DAMAI_PROXIES):
print(f"❌ 无效的代理索引: {index},请使用 0 或 1")
return
await test_proxy(DAMAI_PROXIES[index])
if __name__ == "__main__":
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
# 解析命令行参数
if len(sys.argv) > 1:
try:
proxy_index = int(sys.argv[1])
print(f"🎯 测试单个代理(索引: {proxy_index}")
asyncio.run(test_single_proxy(proxy_index))
except ValueError:
print("❌ 参数错误,请使用: python test_damai_proxy.py [0|1]")
print(" 0: 测试代理1")
print(" 1: 测试代理2")
print(" 不带参数: 测试所有代理")
else:
# 测试所有代理
asyncio.run(test_all_proxies())

View File

@@ -0,0 +1,282 @@
"""
对比测试有头模式和无头模式的页面获取情况
"""
import asyncio
from playwright.async_api import async_playwright
import sys
async def test_headless_comparison(proxy_index: int = 0):
"""对比测试有头模式和无头模式"""
print(f"\n{'='*60}")
print(f"🔍 对比测试有头模式 vs 无头模式")
print(f"{'='*60}")
# 从代理配置获取代理信息
from damai_proxy_config import get_proxy_config
proxy_config = get_proxy_config(proxy_index)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f"✅ 使用代理: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
# 配置代理对象
proxy_parts = proxy_url.replace('http://', '').replace('https://', '').split('@')
if len(proxy_parts) == 2:
auth_part = proxy_parts[0]
server_part = proxy_parts[1]
username, password = auth_part.split(':')
proxy_config_obj = {
"server": f"http://{server_part}",
"username": username,
"password": password
}
else:
proxy_config_obj = {"server": proxy_url}
print(f" 配置的代理对象: {proxy_config_obj}")
# 测试无头模式
print(f"\n🧪 测试 1/2: 无头模式 (headless=True)")
await test_single_mode(True, proxy_config_obj)
print(f"\n🧪 测试 2/2: 有头模式 (headless=False)")
await test_single_mode(False, proxy_config_obj)
print(f"\n{'='*60}")
print("✅ 对比测试完成!")
print("="*60)
async def test_single_mode(headless: bool, proxy_config_obj: dict):
"""测试单个模式"""
mode_name = "无头模式" if headless else "有头模式"
print(f" 正在启动浏览器 ({mode_name})...")
try:
async with async_playwright() as p:
# 启动浏览器
browser = await p.chromium.launch(
headless=headless,
proxy=proxy_config_obj,
# 添加一些额外参数以提高稳定性
args=[
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-blink-features=AutomationControlled',
]
)
# 创建上下文
context = await browser.new_context(
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
viewport={'width': 1280, 'height': 720}
)
# 创建页面
page = await context.new_page()
# 访问小红书登录页面
print(f" 访问小红书登录页...")
try:
# 使用不同的wait_until策略
await page.goto('https://creator.xiaohongshu.com/login',
wait_until='domcontentloaded',
timeout=15000)
# 等待一段时间让页面内容加载
await asyncio.sleep(3)
# 获取页面信息
title = await page.title()
url = page.url
content = await page.content()
content_len = len(content)
print(f"{mode_name} - 访问成功")
print(f" 标题: {title}")
print(f" URL: {url}")
print(f" 内容长度: {content_len} 字符")
# 检查关键元素
phone_input = await page.query_selector('input[placeholder="手机号"]')
if phone_input:
print(f" ✅ 找到手机号输入框")
else:
print(f" ❌ 未找到手机号输入框")
# 查找所有input元素
inputs = await page.query_selector_all('input')
print(f" 找到 {len(inputs)} 个input元素")
if content_len == 0:
print(f" ⚠️ 页面内容为空")
elif "验证" in content or "captcha" in content.lower() or "安全" in content:
print(f" ⚠️ 检测到验证或安全提示")
else:
print(f" ✅ 页面内容正常")
except Exception as e:
print(f"{mode_name} - 访问失败: {str(e)}")
await browser.close()
print(f" 🔄 {mode_name} 浏览器已关闭")
except Exception as e:
print(f"{mode_name} - 测试异常: {str(e)}")
async def test_with_different_wait_strategies(proxy_index: int = 0):
"""测试不同的页面等待策略"""
print(f"\n{'='*60}")
print(f"🔍 测试不同页面等待策略")
print(f"{'='*60}")
from damai_proxy_config import get_proxy_config
proxy_config = get_proxy_config(proxy_index)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
proxy_parts = proxy_url.replace('http://', '').replace('https://', '').split('@')
if len(proxy_parts) == 2:
auth_part = proxy_parts[0]
server_part = proxy_parts[1]
username, password = auth_part.split(':')
proxy_config_obj = {
"server": f"http://{server_part}",
"username": username,
"password": password
}
else:
proxy_config_obj = {"server": proxy_url}
wait_strategies = [
('domcontentloaded', 'DOM内容加载完成'),
('load', '页面完全加载'),
('networkidle', '网络空闲'),
('commit', '导航提交')
]
for wait_strategy, description in wait_strategies:
print(f"\n🧪 测试等待策略: {description} ({wait_strategy})")
try:
async with async_playwright() as p:
browser = await p.chromium.launch(
headless=True, # 使用无头模式进行测试
proxy=proxy_config_obj
)
context = await browser.new_context(
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
)
page = await context.new_page()
try:
print(f" 访问小红书登录页 (wait_until='{wait_strategy}')...")
await page.goto('https://creator.xiaohongshu.com/login',
wait_until=wait_strategy,
timeout=15000)
# 额外等待时间
await asyncio.sleep(2)
content = await page.content()
content_len = len(content)
print(f" ✅ 访问成功")
print(f" 内容长度: {content_len} 字符")
# 检查手机号输入框
phone_input = await page.query_selector('input[placeholder="手机号"]')
if phone_input:
print(f" ✅ 找到手机号输入框")
else:
print(f" ❌ 未找到手机号输入框")
except Exception as e:
print(f" ❌ 访问失败: {str(e)}")
await browser.close()
except Exception as e:
print(f" ❌ 测试异常: {str(e)}")
def explain_page_loading_factors():
"""解释影响页面加载的因素"""
print("="*60)
print("💡 影响页面加载的因素")
print("="*60)
print("\n1. 浏览器模式差异:")
print(" • 有头模式: 浏览器界面可见,渲染更完整")
print(" • 无头模式: 后台运行,可能加载策略略有不同")
print("\n2. 页面等待策略:")
print(" • domcontentloaded: DOM构建完成推荐")
print(" • load: 所有资源加载完成")
print(" • networkidle: 网络空闲(可能等待较长时间)")
print("\n3. 反检测措施:")
print(" • 浏览器指纹混淆")
print(" • User-Agent设置")
print(" • 禁用webdriver属性")
print("\n4. 网络因素:")
print(" • 代理IP质量")
print(" • 网络延迟")
print(" • 目标网站反爬虫机制")
async def main():
"""主函数"""
explain_page_loading_factors()
print(f"\n{'='*60}")
print("🎯 选择测试模式")
print("="*60)
print("\n1. 有头模式 vs 无头模式对比测试")
print("2. 不同页面等待策略测试")
try:
choice = input("\n请选择测试模式 (1-2, 默认为1): ").strip()
if choice not in ['1', '2']:
choice = '1'
proxy_choice = input("请选择代理 (0 或 1, 默认为0): ").strip()
if proxy_choice not in ['0', '1']:
proxy_choice = '0'
proxy_idx = int(proxy_choice)
if choice == '1':
await test_headless_comparison(proxy_idx)
elif choice == '2':
await test_with_different_wait_strategies(proxy_idx)
print(f"\n{'='*60}")
print("✅ 测试完成!")
print("="*60)
except KeyboardInterrupt:
print("\n\n⚠️ 测试被用户中断")
except Exception as e:
print(f"\n❌ 测试过程中出现错误: {str(e)}")
import traceback
traceback.print_exc()
if __name__ == "__main__":
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
# 运行测试
asyncio.run(main())

View File

@@ -0,0 +1,356 @@
"""
使用代理并开启有头模式的示例
展示如何在使用代理的同时开启浏览器界面
"""
import asyncio
from playwright.async_api import async_playwright
import sys
async def test_proxy_with_headless_false(proxy_index: int = 0):
"""使用代理并开启有头模式测试"""
print(f"\n{'='*60}")
print(f"🔍 测试代理 + 有头模式")
print(f"{'='*60}")
# 从代理配置获取代理信息
from damai_proxy_config import get_proxy_config
proxy_config = get_proxy_config(proxy_index)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f"✅ 使用代理: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
print(f" 有头模式: 开启")
try:
async with async_playwright() as p:
# 配置代理
proxy_parts = proxy_url.replace('http://', '').replace('https://', '').split('@')
if len(proxy_parts) == 2:
auth_part = proxy_parts[0]
server_part = proxy_parts[1]
username, password = auth_part.split(':')
proxy_config_obj = {
"server": f"http://{server_part}",
"username": username,
"password": password
}
else:
proxy_config_obj = {"server": proxy_url}
print(f" 配置的代理对象: {proxy_config_obj}")
# 启动浏览器 - 使用有头模式
browser = await p.chromium.launch(
headless=False, # 有头模式,可以看到浏览器界面
proxy=proxy_config_obj
)
# 创建上下文
context = await browser.new_context(
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
)
# 创建页面
page = await context.new_page()
print(f"\n🌐 访问百度测试代理连接...")
try:
await page.goto('https://www.baidu.com', wait_until='networkidle', timeout=15000)
await asyncio.sleep(2)
title = await page.title()
url = page.url
print(f" ✅ 百度访问成功")
print(f" 标题: {title}")
print(f" URL: {url}")
except Exception as e:
print(f" ❌ 百度访问失败: {str(e)}")
print(f"\n🌐 访问小红书创作者平台...")
try:
await page.goto('https://creator.xiaohongshu.com/login', wait_until='networkidle', timeout=15000)
await asyncio.sleep(3)
title = await page.title()
url = page.url
content_len = len(await page.content())
print(f" 访问结果:")
print(f" 标题: {title}")
print(f" URL: {url}")
print(f" 内容长度: {content_len} 字符")
if content_len == 0:
print(f" ⚠️ 页面内容为空")
else:
print(f" ✅ 页面加载成功")
except Exception as e:
print(f" ❌ 小红书访问失败: {str(e)}")
print(f"\n⏸️ 浏览器保持打开状态,您可以观察页面")
print(f" 代理正在生效,您可以看到浏览器界面")
print(f" 按 Enter 键关闭浏览器...")
# 等待用户输入
input()
await browser.close()
print(f"✅ 浏览器已关闭")
except Exception as e:
print(f"❌ 测试过程异常: {str(e)}")
import traceback
traceback.print_exc()
async def test_xhs_login_with_headless_false(phone: str, proxy_index: int = 0):
"""
使用有头模式测试小红书登录流程
Args:
phone: 手机号
proxy_index: 代理索引 (0 或 1)
"""
print(f"\n{'='*60}")
print(f"📱 使用有头模式测试小红书登录")
print(f"{'='*60}")
# 从代理配置获取代理信息
from damai_proxy_config import get_proxy_config
proxy_config = get_proxy_config(proxy_index)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f"✅ 使用代理: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
print(f" 手机号: {phone}")
print(f" 有头模式: 开启")
# 创建登录服务,使用有头模式
from xhs_login import XHSLoginService
login_service = XHSLoginService(use_pool=False) # 不使用池,便于调试
try:
# 初始化浏览器(使用代理 + 有头模式)
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
# 注意XHSLoginService 内部使用了浏览器池模式,我们先看看如何修改它来支持有头模式
print(" 正在启动浏览器(使用代理 + 有头模式)...")
# 直接使用Playwright创建有头模式的浏览器
async with async_playwright() as p:
# 配置代理
proxy_parts = proxy_url.replace('http://', '').replace('https://', '').split('@')
if len(proxy_parts) == 2:
auth_part = proxy_parts[0]
server_part = proxy_parts[1]
username, password = auth_part.split(':')
proxy_config_obj = {
"server": f"http://{server_part}",
"username": username,
"password": password
}
else:
proxy_config_obj = {"server": proxy_url}
# 启动浏览器 - 有头模式
browser = await p.chromium.launch(
headless=False, # 有头模式
proxy=proxy_config_obj
)
context = await browser.new_context(
user_agent=user_agent,
viewport={'width': 1280, 'height': 720}
)
page = await context.new_page()
print("✅ 浏览器启动成功(有头模式 + 代理)")
# 访问小红书登录页面
print(f"\n🌐 访问小红书创作者平台登录页...")
await page.goto('https://creator.xiaohongshu.com/login', wait_until='networkidle', timeout=30000)
await asyncio.sleep(2)
print(f"✅ 进入登录页面")
print(f" 当前URL: {page.url}")
# 查找手机号输入框
print(f"\n🔍 查找手机号输入框...")
try:
# 尝试多种选择器
phone_input_selectors = [
'input[placeholder="手机号"]',
'input[placeholder*="手机"]',
'input[type="tel"]',
'input[type="text"]'
]
phone_input = None
for selector in phone_input_selectors:
try:
phone_input = await page.wait_for_selector(selector, timeout=3000)
if phone_input:
print(f" ✅ 找到手机号输入框: {selector}")
break
except:
continue
if phone_input:
# 输入手机号
await phone_input.fill(phone)
print(f" ✅ 已输入手机号: {phone}")
# 等待界面更新
await asyncio.sleep(1)
# 查找发送验证码按钮
print(f"\n🔍 查找发送验证码按钮...")
code_button_selectors = [
'text="发送验证码"',
'text="获取验证码"',
'button:has-text("验证码")',
'button:has-text("发送")',
'div:has-text("验证码")'
]
code_button = None
for selector in code_button_selectors:
try:
code_button = await page.wait_for_selector(selector, timeout=3000)
if code_button:
print(f" ✅ 找到验证码按钮: {selector}")
break
except:
continue
if code_button:
print(f"\n 已找到手机号输入框和验证码按钮")
print(f" 您可以在浏览器中手动点击发送验证码")
print(f" 验证码将发送到: {phone}")
print(f"\n⏸️ 浏览器保持打开状态,您可以手动操作")
print(f" 按 Enter 键关闭浏览器...")
input()
else:
print(f" ❌ 未找到发送验证码按钮")
else:
print(f" ❌ 未找到手机号输入框")
print(f"\n📄 页面上可用的输入框:")
inputs = await page.query_selector_all('input')
for i, inp in enumerate(inputs):
try:
placeholder = await inp.get_attribute('placeholder')
input_type = await inp.get_attribute('type')
print(f" 输入框 {i+1}: type={input_type}, placeholder={placeholder}")
except:
continue
except Exception as e:
print(f" ❌ 操作失败: {str(e)}")
# 保持浏览器打开供用户观察
print(f"\n⏸️ 浏览器保持打开状态,您可以观察页面元素")
print(f" 按 Enter 键关闭浏览器...")
input()
await browser.close()
print(f"✅ 浏览器已关闭")
except Exception as e:
print(f"❌ 测试过程异常: {str(e)}")
import traceback
traceback.print_exc()
def show_headless_comparison():
"""显示有头模式和无头模式的对比"""
print("="*60)
print("💡 有头模式 vs 无头模式对比")
print("="*60)
print("\n有头模式 (headless=False):")
print(" ✅ 优点:")
print(" • 可以看到浏览器界面,便于调试")
print(" • 可以观察页面加载过程")
print(" • 可以手动与页面交互")
print(" • 有助于识别页面元素选择器")
print("")
print(" ❌ 缺点:")
print(" • 占用屏幕空间")
print(" • 可能影响用户其他操作")
print(" • 资源消耗稍大")
print("\n无头模式 (headless=True):")
print(" ✅ 优点:")
print(" • 不显示浏览器界面,后台运行")
print(" • 资源消耗较少")
print(" • 适合自动化任务")
print(" • 可以在服务器环境运行")
print("")
print(" ❌ 缺点:")
print(" • 无法直观看到页面")
print(" • 调试相对困难")
print("\n🎯 使用建议:")
print(" • 开发调试时使用有头模式")
print(" • 生产环境使用无头模式")
print(" • 代理配置在两种模式下都有效")
async def main():
"""主函数"""
show_headless_comparison()
print(f"\n{'='*60}")
print("🎯 选择测试模式")
print("="*60)
print("\n1. 基础代理 + 有头模式测试")
print("2. 小红书登录 + 有头模式测试")
try:
choice = input("\n请选择测试模式 (1-2, 默认为1): ").strip()
if choice not in ['1', '2']:
choice = '1'
proxy_choice = input("请选择代理 (0 或 1, 默认为0): ").strip()
if proxy_choice not in ['0', '1']:
proxy_choice = '0'
proxy_idx = int(proxy_choice)
if choice == '1':
await test_proxy_with_headless_false(proxy_idx)
elif choice == '2':
phone = input("请输入手机号: ").strip()
if not phone:
print("❌ 手机号不能为空")
return
await test_xhs_login_with_headless_false(phone, proxy_idx)
print(f"\n{'='*60}")
print("✅ 测试完成!")
print("="*60)
except KeyboardInterrupt:
print("\n\n⚠️ 测试被用户中断")
except Exception as e:
print(f"\n❌ 测试过程中出现错误: {str(e)}")
import traceback
traceback.print_exc()
if __name__ == "__main__":
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
# 运行测试
asyncio.run(main())

261
backend/test_login_flow.py Normal file
View File

@@ -0,0 +1,261 @@
"""
小红书验证码登录流程测试脚本
测试完整的验证码发送和登录流程
"""
import asyncio
import sys
from xhs_login import XHSLoginService
async def test_send_verification_code(phone: str, proxy_index: int = 0):
"""
测试发送验证码流程
Args:
phone: 手机号
proxy_index: 代理索引 (0 或 1)
"""
print(f"\n{'='*60}")
print(f"📱 测试发送验证码流程")
print(f"{'='*60}")
# 从代理配置获取代理信息
from damai_proxy_config import get_proxy_config
proxy_config = get_proxy_config(proxy_index)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f"✅ 使用代理: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
print(f" 手机号: {phone}")
# 创建登录服务
login_service = XHSLoginService()
try:
# 初始化浏览器(使用代理)
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
await login_service.init_browser(proxy=proxy_url, user_agent=user_agent)
print("✅ 浏览器初始化成功(已启用代理)")
# 发送验证码
print(f"\n📤 正在发送验证码到 {phone}...")
result = await login_service.send_verification_code(phone)
if result.get('success'):
print(f"✅ 验证码发送成功!")
print(f" 消息: {result.get('message')}")
return login_service # 返回服务实例供后续登录使用
else:
print(f"❌ 验证码发送失败: {result.get('error')}")
return None
except Exception as e:
print(f"❌ 发送验证码过程异常: {str(e)}")
import traceback
traceback.print_exc()
return None
async def test_login_with_code(login_service: XHSLoginService, phone: str, code: str):
"""
测试使用验证码登录
Args:
login_service: XHSLoginService实例
phone: 手机号
code: 验证码
"""
print(f"\n{'='*60}")
print(f"🔑 测试使用验证码登录")
print(f"{'='*60}")
print(f" 手机号: {phone}")
print(f" 验证码: {code}")
try:
# 执行登录
result = await login_service.login(phone, code)
if result.get('success'):
print("✅ 登录成功!")
# 显示获取到的Cookies信息
cookies = result.get('cookies', {})
print(f" 获取到 {len(cookies)} 个Cookie")
# 保存完整Cookies到文件
cookies_full = result.get('cookies_full', [])
if cookies_full:
import json
with open('cookies.json', 'w', encoding='utf-8') as f:
json.dump(cookies_full, f, ensure_ascii=False, indent=2)
print(" ✅ 已保存完整Cookies到 cookies.json")
# 显示用户信息
user_info = result.get('user_info', {})
if user_info:
print(f" 用户信息: {list(user_info.keys())}")
return result
else:
print(f"❌ 登录失败: {result.get('error')}")
return result
except Exception as e:
print(f"❌ 登录过程异常: {str(e)}")
import traceback
traceback.print_exc()
return {"success": False, "error": str(e)}
async def test_complete_login_flow(phone: str, code: str = None, proxy_index: int = 0):
"""
测试完整的登录流程
Args:
phone: 手机号
code: 验证码如果为None则只测试发送验证码
proxy_index: 代理索引
"""
print("="*60)
print("🔄 测试完整登录流程")
print("="*60)
# 步骤1: 发送验证码
print("\n📋 步骤1: 发送验证码")
login_service = await test_send_verification_code(phone, proxy_index)
if not login_service:
print("❌ 发送验证码失败,终止流程")
return
# 如果提供了验证码,则执行登录
if code:
print("\n📋 步骤2: 使用验证码登录")
result = await test_login_with_code(login_service, phone, code)
if result.get('success'):
print("\n🎉 完整登录流程成功!")
else:
print(f"\n❌ 完整登录流程失败: {result.get('error')}")
else:
print("\n⚠️ 提供了验证码参数才可完成登录步骤")
print(" 请在手机上查看验证码,然后调用登录方法")
# 清理资源
await login_service.close_browser()
async def test_multiple_proxies_login(phone: str, proxy_indices: list = [0, 1]):
"""
测试使用多个代理进行登录
Args:
phone: 手机号
proxy_indices: 代理索引列表
"""
print("="*60)
print("🔄 测试多代理登录")
print("="*60)
for i, proxy_idx in enumerate(proxy_indices):
print(f"\n🧪 测试代理 {proxy_idx + 1} (第 {i+1} 次尝试)")
# 由于验证码只能发送一次,这里只测试发送验证码
login_service = await test_send_verification_code(phone, proxy_idx)
if login_service:
print(f" ✅ 代理 {proxy_idx + 1} 发送验证码成功")
await login_service.close_browser()
else:
print(f" ❌ 代理 {proxy_idx + 1} 发送验证码失败")
# 在测试之间添加延迟
if i < len(proxy_indices) - 1:
print(" ⏳ 等待3秒后测试下一个代理...")
await asyncio.sleep(3)
def show_usage_examples():
"""显示使用示例"""
print("="*60)
print("💡 使用示例")
print("="*60)
print("\n1⃣ 仅发送验证码:")
print(" # 发送验证码到手机号使用代理1")
print(" await test_send_verification_code('13800138000', proxy_index=0)")
print("\n2⃣ 完整登录流程:")
print(" # 完整流程:发送验证码 + 登录")
print(" await test_complete_login_flow('13800138000', '123456', proxy_index=0)")
print("\n3⃣ 多代理测试:")
print(" # 测试多个代理")
print(" await test_multiple_proxies_login('13800138000', [0, 1])")
async def main():
"""主函数"""
show_usage_examples()
print(f"\n{'='*60}")
print("🎯 选择测试模式")
print("="*60)
print("\n1. 发送验证码测试")
print("2. 完整登录流程测试")
print("3. 多代理测试")
try:
choice = input("\n请选择测试模式 (1-3, 默认为1): ").strip()
if choice not in ['1', '2', '3']:
choice = '1'
phone = input("请输入手机号: ").strip()
if not phone:
print("❌ 手机号不能为空")
return
if choice == '1':
proxy_choice = input("请选择代理 (0 或 1, 默认为0): ").strip()
if proxy_choice not in ['0', '1']:
proxy_choice = '0'
proxy_idx = int(proxy_choice)
await test_send_verification_code(phone, proxy_idx)
elif choice == '2':
code = input("请输入验证码 (留空则只测试发送): ").strip()
proxy_choice = input("请选择代理 (0 或 1, 默认为0): ").strip()
if proxy_choice not in ['0', '1']:
proxy_choice = '0'
proxy_idx = int(proxy_choice)
await test_complete_login_flow(phone, code if code else None, proxy_idx)
elif choice == '3':
await test_multiple_proxies_login(phone)
print(f"\n{'='*60}")
print("✅ 测试完成!")
print("="*60)
except KeyboardInterrupt:
print("\n\n⚠️ 测试被用户中断")
except Exception as e:
print(f"\n❌ 测试过程中出现错误: {str(e)}")
import traceback
traceback.print_exc()
if __name__ == "__main__":
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
# 运行测试
asyncio.run(main())

View File

@@ -0,0 +1,106 @@
"""
测试登录页面配置功能
验证通过配置文件控制登录页面类型creator vs home
"""
import sys
from config import load_config
def test_config_reading():
"""测试配置读取"""
print("="*60)
print("测试配置文件读取")
print("="*60)
# 测试dev配置
print("\n1. 测试开发环境配置 (config.dev.yaml)")
config_dev = load_config('dev')
login_page = config_dev.get_str('login.page', 'creator')
login_headless = config_dev.get_bool('login.headless', False)
print(f" login.page = {login_page}")
print(f" login.headless = {login_headless}")
# 根据配置决定预热URL
if login_page == "home":
preheat_url = "https://www.xiaohongshu.com"
else:
preheat_url = "https://creator.xiaohongshu.com/login"
print(f" 预热URL = {preheat_url}")
# 测试prod配置
print("\n2. 测试生产环境配置 (config.prod.yaml)")
config_prod = load_config('prod')
login_page_prod = config_prod.get_str('login.page', 'creator')
login_headless_prod = config_prod.get_bool('login.headless', False)
print(f" login.page = {login_page_prod}")
print(f" login.headless = {login_headless_prod}")
if login_page_prod == "home":
preheat_url_prod = "https://www.xiaohongshu.com"
else:
preheat_url_prod = "https://creator.xiaohongshu.com/login"
print(f" 预热URL = {preheat_url_prod}")
print("\n" + "="*60)
print("✅ 配置读取测试完成")
print("="*60)
def test_api_parameter_override():
"""测试API参数覆盖配置"""
print("\n" + "="*60)
print("测试API参数覆盖配置")
print("="*60)
config = load_config('dev')
default_login_page = config.get_str('login.page', 'creator')
# 模拟不同的API参数情况
test_cases = [
(None, "应使用配置默认值"),
("creator", "API指定creator"),
("home", "API指定home"),
]
for api_param, description in test_cases:
login_page = api_param if api_param else default_login_page
print(f"\n场景: {description}")
print(f" 配置默认值 = {default_login_page}")
print(f" API参数 = {api_param}")
print(f" 最终使用 = {login_page}")
# 决定URL
if login_page == "home":
url = "https://www.xiaohongshu.com"
page_name = "小红书首页"
else:
url = "https://creator.xiaohongshu.com/login"
page_name = "创作者中心"
print(f" → 将访问: {page_name} ({url})")
print("\n" + "="*60)
print("✅ API参数覆盖测试完成")
print("="*60)
if __name__ == "__main__":
try:
test_config_reading()
test_api_parameter_override()
print("\n🎉 所有测试通过!")
print("\n使用说明:")
print("1. 在 config.dev.yaml 或 config.prod.yaml 中修改 login.page 配置")
print("2. 可选值: creator (创作者中心) 或 home (小红书首页)")
print("3. API请求中的 login_page 参数可以覆盖配置文件的默认值")
print("4. 如果API请求不传 login_page 参数,将使用配置文件中的默认值")
except Exception as e:
print(f"\n❌ 测试失败: {str(e)}")
import traceback
traceback.print_exc()
sys.exit(1)

View File

@@ -0,0 +1,246 @@
"""
优化的代理浏览器配置
解决小红书对代理IP的限制问题
"""
import asyncio
from playwright.async_api import async_playwright
import sys
async def test_optimized_proxy_browser(proxy_index: int = 0):
"""测试优化的代理浏览器配置"""
print(f"\n{'='*60}")
print(f"🚀 测试优化的代理浏览器配置")
print(f"{'='*60}")
# 从代理配置获取代理信息
from damai_proxy_config import get_proxy_config
proxy_config = get_proxy_config(proxy_index)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f"✅ 使用代理: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
try:
async with async_playwright() as p:
# 配置代理
proxy_parts = proxy_url.replace('http://', '').replace('https://', '').split('@')
if len(proxy_parts) == 2:
auth_part = proxy_parts[0]
server_part = proxy_parts[1]
username, password = auth_part.split(':')
proxy_config_obj = {
"server": f"http://{server_part}",
"username": username,
"password": password
}
else:
proxy_config_obj = {"server": proxy_url}
print(f" 配置的代理对象: {proxy_config_obj}")
# 启动浏览器 - 使用优化参数
browser = await p.chromium.launch(
headless=False, # 使用有头模式,便于观察
proxy=proxy_config_obj,
args=[
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-blink-features=AutomationControlled',
'--disable-background-timer-throttling',
'--disable-renderer-backgrounding',
'--disable-background-networking',
'--enable-features=NetworkService,NetworkServiceInProcess',
'--disable-ipc-flooding-protection',
'--disable-web-security',
'--disable-features=IsolateOrigins,site-per-process',
'--disable-site-isolation-trials',
'--disable-extensions',
'--disable-breakpad',
'--disable-component-extensions-with-background-pages',
'--disable-hang-monitor',
'--disable-prompt-on-repost',
'--disable-domain-reliability',
'--disable-component-update',
'--hide-scrollbars',
'--mute-audio',
'--no-first-run',
'--no-default-browser-check',
'--metrics-recording-only',
'--force-color-profile=srgb',
'--disable-default-apps',
'--disable-features=TranslateUI',
'--disable-features=Translate',
'--disable-features=OptimizationHints',
'--disable-features=InterestCohortAPI',
'--disable-features=BlinkGenPropertyTrees',
'--disable-features=ImprovedCookieControls',
'--disable-features=SameSiteDefaultChecksMethodRigorously',
'--disable-features=CookieSameSiteByDefaultWhenReportingEnabled',
'--disable-features=AutofillServerCommunication',
'--disable-features=AutofillUseOptimizedLocalStorage',
'--disable-features=CalculateNativeWinOcclusion',
'--disable-features=VizDisplayCompositor',
'--disable-features=VizHitTestQuery',
]
)
# 创建上下文 - 设置浏览器指纹混淆
context = await browser.new_context(
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
viewport={'width': 1280, 'height': 720},
# 隐瞒自动化特征
bypass_csp=True,
java_script_enabled=True,
)
# 创建页面
page = await context.new_page()
# 隐瞒自动化特征
await page.add_init_script("""
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined,
});
Object.defineProperty(navigator, 'plugins', {
get: () => [1, 2, 3, 4, 5],
});
Object.defineProperty(navigator, 'languages', {
get: () => ['zh-CN', 'zh', 'en'],
});
// 隐瞒代理检测
delete window.cdc_adoQpoasnfa76pfcZLmcfl_Array;
delete window.cdc_adoQpoasnfa76pfcZLmcfl_Promise;
delete window.cdc_adoQpoasnfa76pfcZLmcfl_Symbol;
""")
print(f"\n🌐 访问百度测试代理连接...")
try:
await page.goto('https://www.baidu.com', wait_until='domcontentloaded', timeout=15000)
await asyncio.sleep(2)
title = await page.title()
url = page.url
print(f" ✅ 百度访问成功")
print(f" 标题: {title}")
print(f" URL: {url}")
except Exception as e:
print(f" ❌ 百度访问失败: {str(e)}")
print(f"\n🌐 访问小红书创作者平台...")
try:
await page.goto('https://creator.xiaohongshu.com/login', wait_until='domcontentloaded', timeout=30000)
await asyncio.sleep(3) # 等待更长时间
title = await page.title()
url = page.url
content = await page.content()
content_len = len(content)
print(f" 访问结果:")
print(f" 标题: {title}")
print(f" URL: {url}")
print(f" 内容长度: {content_len} 字符")
if content_len == 0:
print(f" ⚠️ 页面内容为空")
elif "验证" in content or "captcha" in content.lower() or "安全" in content:
print(f" ⚠️ 检测到验证或安全提示")
else:
print(f" ✅ 页面加载成功")
# 查找手机号输入框
print(f"\n🔍 查找手机号输入框...")
try:
phone_input = await page.wait_for_selector('input[placeholder="手机号"]', timeout=5000)
if phone_input:
print(f" ✅ 找到手机号输入框")
else:
print(f" ❌ 未找到手机号输入框")
except:
print(f" ❌ 未找到手机号输入框")
# 查找所有input元素
inputs = await page.query_selector_all('input')
print(f" 找到 {len(inputs)} 个input元素")
# 查找发送验证码按钮
print(f"\n🔍 查找发送验证码按钮...")
try:
code_button = await page.wait_for_selector('text="发送验证码"', timeout=5000)
if code_button:
print(f" ✅ 找到发送验证码按钮")
else:
print(f" ❌ 未找到发送验证码按钮")
except:
print(f" ❌ 未找到发送验证码按钮")
except Exception as e:
print(f" ❌ 小红书访问失败: {str(e)}")
print(f"\n⏸️ 浏览器保持打开状态,您可以观察页面")
print(f" 按 Enter 键关闭浏览器...")
input()
await browser.close()
print(f"✅ 浏览器已关闭")
except Exception as e:
print(f"❌ 测试过程异常: {str(e)}")
import traceback
traceback.print_exc()
def explain_optimizations():
"""解释优化措施"""
print("="*60)
print("🔧 优化措施说明")
print("="*60)
print("\n1. 浏览器启动参数优化:")
print(" • 添加更多反检测参数")
print(" • 禁用可能导致检测的功能")
print("\n2. 浏览器指纹混淆:")
print(" • 隐瞒webdriver特征")
print(" • 伪造插件列表")
print(" • 设置真实语言")
print("\n3. 页面加载策略:")
print(" • 使用domcontentloaded而非networkidle")
print(" • 增加超时时间")
async def main():
"""主函数"""
explain_optimizations()
print(f"\n{'='*60}")
print("🎯 选择代理进行测试")
print("="*60)
proxy_choice = input("\n请选择代理 (0 或 1, 默认为0): ").strip()
if proxy_choice not in ['0', '1']:
proxy_choice = '0'
proxy_idx = int(proxy_choice)
await test_optimized_proxy_browser(proxy_idx)
print(f"\n{'='*60}")
print("✅ 测试完成!")
print("="*60)
if __name__ == "__main__":
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
# 运行测试
asyncio.run(main())

51
backend/test_oss.py Normal file
View File

@@ -0,0 +1,51 @@
"""
测试OSS上传功能
"""
import sys
from oss_utils import OSSUploader
def test_oss_connection():
"""测试OSS连接"""
print("=" * 60)
print("测试阿里云OSS连接")
print("=" * 60)
try:
# 创建OSS上传器
uploader = OSSUploader()
print(f"\n✅ OSS配置:")
print(f" Bucket: {uploader.bucket_name}")
print(f" Endpoint: {uploader.endpoint}")
print(f" Access Key ID: {uploader.access_key_id[:8]}...")
# 测试Bucket是否可访问
try:
# 列出bucket中的对象最多1个
result = uploader.bucket.list_objects(prefix=uploader.base_path, max_keys=1)
print(f"\n✅ Bucket访问成功!")
print(f" 基础路径: {uploader.base_path}")
if result.object_list:
print(f" 示例文件: {result.object_list[0].key}")
except Exception as e:
print(f"\n❌ Bucket访问失败: {e}")
return False
print("\n" + "=" * 60)
print("✅ OSS配置测试通过!")
print("=" * 60)
return True
except Exception as e:
print(f"\n❌ OSS初始化失败: {e}")
print("\n请检查配置:")
print(" 1. Access Key ID和Secret是否正确")
print(" 2. Bucket名称是否正确")
print(" 3. Endpoint地区是否匹配")
return False
if __name__ == "__main__":
success = test_oss_connection()
sys.exit(0 if success else 1)

View File

@@ -0,0 +1,16 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import hashlib
passwords = [
"123456",
"password",
"admin123",
]
print("=== Python SHA256 密码加密测试 ===")
for pwd in passwords:
hash_result = hashlib.sha256(pwd.encode('utf-8')).hexdigest()
print(f"密码: {pwd}")
print(f"SHA256: {hash_result}\n")

View File

@@ -0,0 +1,152 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
固定代理IP测试脚本
使用requests请求代理服务器验证代理是否可用
"""
import requests
import json
from damai_proxy_config import get_proxy_config, get_all_enabled_proxies
def test_proxy_requests(proxy_info, target_url="http://httpbin.org/ip"):
"""
使用requests测试代理IP
Args:
proxy_info: 代理信息字典包含server, username, password
target_url: 目标测试URL
"""
print(f"\n{'='*60}")
print(f"🔍 测试代理: {proxy_info.get('name', 'Unknown')}")
print(f" 服务器: {proxy_info['server']}")
print(f" 用户名: {proxy_info['username']}")
print(f" 目标URL: {target_url}")
print(f"{'='*60}")
# 构建代理认证信息
proxy_server = proxy_info['server'].replace('http://', '')
proxy_url = f"http://{proxy_info['username']}:{proxy_info['password']}@{proxy_server}"
proxies = {
"http": proxy_url,
"https": proxy_url
}
try:
# 发送测试请求
print("🚀 发送测试请求...")
response = requests.get(target_url, proxies=proxies, timeout=5) # 减少超时时间到5秒
if response.status_code == 200:
print(f"✅ 代理测试成功!状态码: {response.status_code}")
# 尝试解析IP信息
try:
ip_info = response.json()
print(f"🌐 当前IP信息: {json.dumps(ip_info, indent=2, ensure_ascii=False)}")
except:
print(f"🌐 页面内容 (前500字符): {response.text[:500]}")
return True
else:
print(f"❌ 代理测试失败!状态码: {response.status_code}")
print(f"响应内容: {response.text[:200]}")
return False
except requests.exceptions.ProxyError:
print("❌ 代理连接错误:无法连接到代理服务器")
return False
except requests.exceptions.ConnectTimeout:
print("❌ 连接超时:代理服务器响应超时")
return False
except requests.exceptions.RequestException as e:
print(f"❌ 请求异常: {str(e)}")
return False
def test_all_proxies():
"""测试所有配置的代理"""
print("🎯 开始测试所有代理IP")
proxies = get_all_enabled_proxies()
if not proxies:
print("❌ 没有找到可用的代理配置")
return
print(f"📊 共找到 {len(proxies)} 个代理IP")
results = []
for i, proxy in enumerate(proxies, 1):
print(f"\n\n{'#'*60}")
print(f"# 测试进度: {i}/{len(proxies)}")
print(f"{'#'*60}")
success = test_proxy_requests(proxy)
results.append({
'proxy': proxy['name'],
'server': proxy['server'],
'success': success
})
if i < len(proxies):
print(f"\n⏳ 等待2秒后测试下一个代理...")
import time
time.sleep(2)
# 输出测试结果汇总
print(f"\n{'='*60}")
print("📊 测试结果汇总:")
print(f"{'='*60}")
success_count = 0
for result in results:
status = "✅ 成功" if result['success'] else "❌ 失败"
print(f" {result['proxy']} ({result['server']}) - {status}")
if result['success']:
success_count += 1
print(f"\n📈 总体成功率: {success_count}/{len(results)} ({success_count/len(results)*100:.1f}%)")
# 如果有成功的代理,显示可用于小红书的代理
successful_proxies = [r for r in results if r['success']]
if successful_proxies:
print(f"\n🎉 以下代理可用于小红书登录发文:")
for proxy in successful_proxies:
print(f" - {proxy['proxy']}: {proxy['server']}")
return results
def test_xhs_proxy_format():
"""测试适用于小红书的代理格式"""
print(f"\n{'='*60}")
print("🔧 测试适用于Playwright的代理格式")
print(f"{'='*60}")
proxies = get_all_enabled_proxies()
for proxy in proxies:
server = proxy['server'].replace('http://', '') # 移除http://前缀
proxy_url = f"http://{proxy['username']}:{proxy['password']}@{server}"
print(f" {proxy['name']}:")
print(f" 服务器地址: {proxy['server']}")
print(f" Playwright格式: {proxy_url}")
print()
if __name__ == "__main__":
print("🚀 开始测试固定代理IP")
# 测试代理格式
test_xhs_proxy_format()
# 测试所有代理
test_all_proxies()
print(f"\n{'='*60}")
print("🎉 代理测试完成!")
print(f"{'='*60}")

View File

@@ -0,0 +1,126 @@
"""
固定代理IP详细测试脚本
测试代理IP在Playwright中的表现包含更多调试信息
"""
import asyncio
import json
import sys
from xhs_login import XHSLoginService
from damai_proxy_config import get_proxy_config
async def test_proxy_detailed(proxy_index: int = 0):
"""详细测试代理IP"""
print(f"\n{'='*60}")
print(f"🔍 详细测试代理: 代理{proxy_index + 1}")
print(f"{'='*60}")
# 获取代理配置
try:
proxy_config = get_proxy_config(proxy_index)
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_config['server'][7:]}" # 移除http://前缀再重新组装
print(f"✅ 获取代理配置成功: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
except Exception as e:
print(f"❌ 获取代理配置失败: {str(e)}")
return None
# 创建登录服务实例
login_service = XHSLoginService(use_pool=False) # 不使用池,便于调试
try:
# 初始化浏览器(使用代理)
print(f"\n🚀 正在启动浏览器(使用代理)...")
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
await login_service.init_browser(proxy=proxy_url, user_agent=user_agent)
print("✅ 浏览器启动成功")
# 测试访问普通网站
print(f"\n📍 测试访问普通网站(百度)...")
try:
await login_service.page.goto('https://www.baidu.com', wait_until='networkidle', timeout=10000)
await asyncio.sleep(2)
title = await login_service.page.title()
url = login_service.page.url
print(f"✅ 百度访问成功")
print(f" 页面标题: {title}")
print(f" 当前URL: {url}")
except Exception as e:
print(f"❌ 百度访问失败: {str(e)}")
# 测试访问IP检测网站
print(f"\n📍 测试访问IP检测网站...")
try:
await login_service.page.goto('http://httpbin.org/ip', wait_until='networkidle', timeout=10000)
await asyncio.sleep(2)
content = await login_service.page.content()
print(f"✅ IP检测网站访问成功")
print(f" 页面内容: {content[:200]}...")
except Exception as e:
print(f"❌ IP检测网站访问失败: {str(e)}")
# 测试访问小红书创作者平台
print(f"\n📍 测试访问小红书创作者平台...")
try:
await login_service.page.goto('https://creator.xiaohongshu.com/login', wait_until='networkidle', timeout=20000) # 增加超时时间
await asyncio.sleep(3) # 等待更长时间
title = await login_service.page.title()
url = login_service.page.url
print(f"✅ 小红书访问成功")
print(f" 页面标题: '{title}'")
print(f" 当前URL: {url}")
# 检查页面内容
content = await login_service.page.content()
if "验证" in content or "captcha" in content.lower() or "block" in content.lower() or "安全验证" in content:
print("⚠️ 检测到可能的验证或拦截")
else:
print("✅ 未检测到验证拦截")
except Exception as e:
print(f"❌ 小红书访问失败: {str(e)}")
# 尝试访问普通页面看看是否完全被封
try:
await login_service.page.goto('https://www.google.com', wait_until='networkidle', timeout=10000)
print(" 提示: 代理可以访问其他网站,但可能被小红书限制")
except Exception:
print(" 提示: 代理可能完全被限制")
print(f"\n✅ 代理{proxy_index + 1} 详细测试完成")
return login_service
except Exception as e:
print(f"❌ 代理{proxy_index + 1} 详细测试失败: {str(e)}")
import traceback
traceback.print_exc()
return None
finally:
# 关闭浏览器
await login_service.close_browser()
async def main():
"""主测试函数"""
print("\n" + "="*60)
print("🎯 固定代理IP详细测试")
print("="*60)
# 测试两个代理
for i in range(2):
await test_proxy_detailed(i)
print(f"\n⏳ 等待3秒后测试下一个代理...")
await asyncio.sleep(3)
print(f"\n{'='*60}")
print("🎉 详细测试完成!")
print("="*60)
if __name__ == "__main__":
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
# 运行测试
asyncio.run(main())

219
backend/test_proxy_xhs.py Normal file
View File

@@ -0,0 +1,219 @@
"""
固定代理IP下小红书登录发文功能测试脚本
测试使用固定代理IP进行小红书登录和发文功能
"""
import asyncio
import json
import sys
from xhs_login import XHSLoginService
from xhs_publish import XHSPublishService
from damai_proxy_config import get_proxy_config
async def test_login_with_proxy(proxy_index: int = 0):
"""使用指定代理测试小红书登录"""
print(f"\n{'='*60}")
print(f"🔍 开始测试代理登录: 代理{proxy_index + 1}")
print(f"{'='*60}")
# 获取代理配置
try:
proxy_config = get_proxy_config(proxy_index)
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_config['server'][7:]}" # 移除http://前缀再重新组装
print(f"✅ 获取代理配置成功: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
except Exception as e:
print(f"❌ 获取代理配置失败: {str(e)}")
return None
# 创建登录服务实例
login_service = XHSLoginService()
try:
# 初始化浏览器(使用代理)
print(f"\n🚀 正在启动浏览器(使用代理)...")
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
await login_service.init_browser(proxy=proxy_url, user_agent=user_agent)
print("✅ 浏览器启动成功")
# 访问小红书创作者平台
print(f"\n📍 访问小红书创作者平台...")
await login_service.page.goto('https://creator.xiaohongshu.com/login', wait_until='networkidle', timeout=30000)
await asyncio.sleep(2)
title = await login_service.page.title()
url = login_service.page.url
print(f"✅ 访问成功")
print(f" 页面标题: {title}")
print(f" 当前URL: {url}")
# 检查是否被代理拦截或出现验证码
content = await login_service.page.content()
if "验证" in content or "captcha" in content.lower() or "block" in content.lower():
print("⚠️ 检测到可能的验证或拦截")
print(f"\n✅ 代理{proxy_index + 1} 连接测试完成")
return login_service
except Exception as e:
print(f"❌ 代理{proxy_index + 1} 测试失败: {str(e)}")
import traceback
traceback.print_exc()
return None
finally:
# 注意:这里不关闭浏览器,让调用者决定何时关闭
pass
async def test_publish_with_proxy(cookies, proxy_index: int = 0):
"""使用指定代理测试小红书发文"""
print(f"\n{'='*60}")
print(f"📝 开始测试代理发文: 代理{proxy_index + 1}")
print(f"{'='*60}")
# 获取代理配置
try:
proxy_config = get_proxy_config(proxy_index)
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_config['server'][7:]}" # 移除http://前缀再重新组装
print(f"✅ 获取代理配置成功: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
except Exception as e:
print(f"❌ 获取代理配置失败: {str(e)}")
return None
# 准备测试数据
title = "【代理测试】固定IP代理发布测试"
content = """这是一条通过固定IP代理发布的测试笔记 📝
测试内容:
- 验证代理IP是否正常工作
- 检查发布功能是否正常
- 确认网络连接稳定性
如果你看到这条笔记,说明代理发布成功了!
#代理测试 #自动化发布 #网络测试"""
# 测试图片(可选)
images = [] # 可以添加图片路径进行测试
# 标签
tags = ["代理测试", "自动化发布", "网络测试"]
try:
# 创建发布服务
print(f"\n🚀 创建发布服务(使用代理: 代理{proxy_index + 1}...")
publisher = XHSPublishService(cookies, proxy=proxy_url)
# 执行发布
print(f"\n📤 开始发布笔记...")
result = await publisher.publish(
title=title,
content=content,
images=images if images else None,
tags=tags
)
# 显示结果
print(f"\n{'='*50}")
print("发布结果:")
print(json.dumps(result, ensure_ascii=False, indent=2))
print("="*50)
if result.get('success'):
print(f"\n✅ 代理{proxy_index + 1} 发布测试成功!")
if 'url' in result:
print(f"📎 笔记链接: {result['url']}")
else:
print(f"\n❌ 代理{proxy_index + 1} 发布测试失败: {result.get('error')}")
return result
except Exception as e:
print(f"❌ 代理{proxy_index + 1} 发布测试异常: {str(e)}")
import traceback
traceback.print_exc()
return None
async def main():
"""主测试函数"""
print("\n" + "="*60)
print("🎯 固定代理IP下小红书登录发文功能测试")
print("="*60)
# 测试代理连接
login_service = None
for i in range(2): # 测试两个代理
login_service = await test_login_with_proxy(i)
if login_service:
print(f"✅ 代理{i+1} 连接测试成功,可以用于后续操作")
break
else:
print(f"⚠️ 代理{i+1} 连接测试失败,尝试下一个代理...")
if not login_service:
print("\n❌ 所有代理都无法连接,测试终止")
return
try:
# 验证登录状态虽然我们没有真正的登录但可以检查Cookie是否有效
print(f"\n🔍 验证当前浏览器状态...")
verify_result = await login_service.verify_login_status()
print(f"验证结果: {verify_result.get('message', '未知状态')}")
except Exception as e:
print(f"验证状态时出错: {str(e)}")
# 如果有cookies.json文件可以尝试使用已保存的cookies进行发布测试
cookies = None
try:
with open('cookies.json', 'r', encoding='utf-8') as f:
cookies = json.load(f)
print(f"\n✅ 成功读取 cookies.json包含 {len(cookies)} 个Cookie")
except FileNotFoundError:
print(f"\n⚠️ cookies.json 文件不存在,跳过发布测试")
print(" 如需测试发布功能请先登录获取Cookie")
if cookies:
# 使用第一个有效的代理进行发布测试
for i in range(2):
proxy_config = get_proxy_config(i)
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_config['server'][7:]}"
# 测试代理连接
temp_login = XHSLoginService()
try:
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
await temp_login.init_browser(cookies=cookies, proxy=proxy_url, user_agent=user_agent)
# 验证登录状态
verify_result = await temp_login.verify_login_status()
if verify_result.get('logged_in'):
print(f"\n✅ 代理{i+1} + Cookie 组合验证成功,开始发布测试")
await test_publish_with_proxy(cookies, i)
break
else:
print(f"⚠️ 代理{i+1} + Cookie 组合验证失败")
except Exception as e:
print(f"⚠️ 代理{i+1} 连接测试失败: {str(e)}")
finally:
await temp_login.close_browser()
else:
print("\n❌ 所有代理都无法与Cookie配合使用发布测试终止")
# 清理资源
if login_service:
await login_service.close_browser()
print(f"\n{'='*60}")
print("🎉 测试完成!")
print("="*60)
if __name__ == "__main__":
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
# 运行测试
asyncio.run(main())

View File

@@ -0,0 +1,224 @@
"""
准确的Playwright代理IP验证脚本
验证Playwright是否正确使用了带认证信息的代理IP
"""
import asyncio
from playwright.async_api import async_playwright
import requests
async def get_my_ip_requests():
"""使用requests获取当前IP不使用代理"""
try:
response = requests.get('http://httpbin.org/ip', timeout=10)
if response.status_code == 200:
data = response.json()
return data.get('origin', 'Unknown')
except Exception as e:
print(f"获取本机IP失败: {str(e)}")
return None
async def get_ip_with_playwright_proxy_correct(proxy_url):
"""使用Playwright获取IP正确使用代理认证"""
try:
async with async_playwright() as p:
# 正确的代理配置格式,包含认证信息
proxy_parts = proxy_url.replace('http://', '').replace('https://', '').split('@')
if len(proxy_parts) == 2:
# 格式: username:password@host:port
auth_part = proxy_parts[0]
server_part = proxy_parts[1]
username, password = auth_part.split(':')
proxy_config = {
"server": f"http://{server_part}",
"username": username,
"password": password
}
print(f" 使用代理配置: {proxy_config}")
else:
# 如果没有认证信息,直接使用
proxy_config = {"server": proxy_url}
browser = await p.chromium.launch(headless=True, proxy=proxy_config)
context = await browser.new_context()
page = await context.new_page()
# 访问IP检测网站
await page.goto('http://httpbin.org/ip', wait_until='networkidle', timeout=15000)
# 获取页面内容
content = await page.content()
await browser.close()
# 尝试解析IP
import json
import re
json_match = re.search(r'\{.*\}', content, re.DOTALL)
if json_match:
try:
ip_data = json.loads(json_match.group())
return ip_data.get('origin', 'Unknown')
except:
print(f" JSON解析失败原始内容: {content[:200]}...")
return 'JSON Parse Error'
print(f" 未找到JSON原始内容: {content[:200]}...")
return 'No JSON Found'
except Exception as e:
print(f" 通过Playwright+代理获取IP失败: {str(e)}")
return f'Error: {str(e)}'
async def test_proxy_formats():
"""测试不同的代理格式"""
print("="*60)
print("🔍 测试不同代理格式")
print("="*60)
# 从代理配置中获取代理信息
from damai_proxy_config import get_proxy_config
# 获取本机IP
print("1⃣ 获取本机IP...")
local_ip = await get_my_ip_requests()
print(f" 本机IP: {local_ip}")
for i in range(2):
print(f"\n2⃣ 测试代理 {i+1}...")
proxy_config = get_proxy_config(i)
print(f" 代理信息: {proxy_config}")
# 格式1: http://username:password@host:port
proxy_url_format1 = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_config['server'][7:]}"
print(f" 格式1 (完整URL): {proxy_url_format1}")
# 测试格式1
ip_with_proxy1 = await get_ip_with_playwright_proxy_correct(proxy_url_format1)
print(f" 使用格式1的IP: {ip_with_proxy1}")
if ip_with_proxy1 != local_ip and ip_with_proxy1 not in ['JSON Parse Error', 'No JSON Found', f'Error:']:
print(f" ✅ 格式1成功: IP已改变代理生效")
else:
print(f" ❌ 格式1失败: IP未改变或出错")
print()
async def test_direct_proxy_config():
"""测试直接使用代理配置对象"""
print("="*60)
print("🔍 测试直接使用代理配置对象")
print("="*60)
# 获取本机IP
print("1⃣ 获取本机IP...")
local_ip = await get_my_ip_requests()
print(f" 本机IP: {local_ip}")
from damai_proxy_config import get_proxy_config
for i in range(2):
print(f"\n2⃣ 测试代理 {i+1} (直接配置)...")
proxy_config = get_proxy_config(i)
# 构建Playwright代理配置对象
playwright_proxy_config = {
"server": proxy_config['server'],
"username": proxy_config['username'],
"password": proxy_config['password']
}
print(f" Playwright代理配置: {playwright_proxy_config}")
try:
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True, proxy=playwright_proxy_config)
context = await browser.new_context()
page = await context.new_page()
# 访问IP检测网站
await page.goto('http://httpbin.org/ip', wait_until='networkidle', timeout=15000)
# 获取页面内容
content = await page.content()
await browser.close()
# 解析IP
import json
import re
json_match = re.search(r'\{.*\}', content, re.DOTALL)
if json_match:
try:
ip_data = json.loads(json_match.group())
ip_address = ip_data.get('origin', 'Unknown')
print(f" 代理{i+1} IP: {ip_address}")
if ip_address != local_ip:
print(f" ✅ 代理{i+1}成功: IP已改变代理生效")
else:
print(f" ❌ 代理{i+1}失败: IP未改变")
except:
print(f" ❌ 代理{i+1} JSON解析失败: {content[:200]}...")
else:
print(f" ❌ 代理{i+1} 未找到IP信息: {content[:200]}...")
except Exception as e:
print(f" ❌ 代理{i+1}连接失败: {str(e)}")
def explain_proxy_formats():
"""解释不同的代理格式"""
print("="*60)
print("📋 代理格式说明")
print("="*60)
print("\n在Playwright中使用代理的两种方式:")
print("\n1⃣ 字典格式(推荐):")
print(" proxy = {")
print(" 'server': 'http://proxy-server:port',")
print(" 'username': 'your_username',")
print(" 'password': 'your_password'")
print(" }")
print(" browser = await playwright.chromium.launch(proxy=proxy)")
print("\n2⃣ URL格式包含认证信息:")
print(" proxy_url = 'http://username:password@proxy-server:port'")
print(" # 需要从中提取认证信息并构建字典格式")
print("\n⚠️ 注意:")
print(" - 不能直接使用包含认证信息的URL字符串作为proxy.server")
print(" - 必须将认证信息分离到单独的username和password字段")
print(" - 代理服务器地址格式应为: http://host:port")
async def main():
"""主函数"""
explain_proxy_formats()
print("\n" + "="*60)
# 测试直接代理配置
await test_direct_proxy_config()
print("\n" + "="*60)
# 测试不同格式
await test_proxy_formats()
print(f"\n{'='*60}")
print("✅ 验证完成!")
print("="*60)
if __name__ == "__main__":
import sys
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
asyncio.run(main())

View File

@@ -0,0 +1,230 @@
"""
Playwright代理IP验证脚本
验证Playwright浏览器是否使用了代理IP而不是本机IP
"""
import asyncio
from playwright.async_api import async_playwright
import requests
import json
async def get_my_ip_requests():
"""使用requests获取当前IP不使用代理"""
try:
response = requests.get('http://httpbin.org/ip', timeout=10)
if response.status_code == 200:
data = response.json()
return data.get('origin', 'Unknown')
except Exception as e:
print(f"获取本机IP失败: {str(e)}")
return None
async def get_browser_ip_via_playwright(proxy_url=None):
"""使用Playwright获取IP可选择是否使用代理"""
try:
async with async_playwright() as p:
# 启动浏览器
launch_kwargs = {
"headless": True, # 无头模式
}
# 如果提供了代理,则使用代理
if proxy_url:
launch_kwargs["proxy"] = {"server": proxy_url}
browser = await p.chromium.launch(**launch_kwargs)
context = await browser.new_context()
page = await context.new_page()
# 访问IP检测网站
await page.goto('http://httpbin.org/ip', wait_until='networkidle', timeout=10000)
# 获取页面内容
content = await page.content()
# 关闭浏览器
await browser.close()
# 解析IP信息
try:
import re
import json
# 查找JSON内容
json_match = re.search(r'\{.*\}', content, re.DOTALL)
if json_match:
ip_data = json.loads(json_match.group())
return ip_data.get('origin', 'Unknown')
except:
pass
return 'Parse Error'
except Exception as e:
print(f"通过Playwright获取IP失败: {str(e)}")
return None
async def verify_proxy_usage():
"""验证代理IP使用情况"""
print("="*60)
print("🔍 Playwright代理IP使用验证")
print("="*60)
# 1. 获取本机IP
print("\n1⃣ 获取本机IP地址...")
local_ip = await get_my_ip_requests()
if local_ip:
print(f" ✅ 本机IP: {local_ip}")
else:
print(" ❌ 无法获取本机IP")
return
# 2. 测试不使用代理时的IP
print("\n2⃣ 测试不使用代理时的IP...")
browser_ip_no_proxy = await get_browser_ip_via_playwright()
print(f" 🌐 Playwright无代理IP: {browser_ip_no_proxy}")
# 3. 测试使用代理时的IP
print("\n3⃣ 测试使用代理时的IP...")
# 从代理配置中获取代理信息
from damai_proxy_config import get_proxy_config
for i in range(2):
try:
proxy_config = get_proxy_config(i)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f" 代理{i+1}: {proxy_config['server']}")
# 获取使用代理时的IP
browser_ip_with_proxy = await get_browser_ip_via_playwright(proxy_url)
print(f" 🌐 Playwright使用代理{i+1}的IP: {browser_ip_with_proxy}")
# 比较IP地址
if browser_ip_with_proxy == local_ip:
print(f" ❌ 代理{i+1}测试失败: IP与本机IP相同代理未生效")
elif browser_ip_with_proxy == proxy_server.split(':')[0]: # 检查是否是代理服务器IP
print(f" ✅ 代理{i+1}测试成功: 使用了代理IP")
elif browser_ip_with_proxy != 'Parse Error' and browser_ip_with_proxy != local_ip:
print(f" ✅ 代理{i+1}测试成功: IP已改变代理生效")
else:
print(f" ⚠️ 代理{i+1}测试结果不确定: {browser_ip_with_proxy}")
except Exception as e:
print(f" ❌ 代理{i+1}测试出错: {str(e)}")
print() # 空行分隔
async def advanced_proxy_verification():
"""高级代理验证 - 使用多个IP检测服务"""
print("="*60)
print("🔬 高级代理IP验证")
print("="*60)
# IP检测服务列表
ip_services = [
'http://httpbin.org/ip',
'https://api.ipify.org?format=json',
'https://jsonip.com',
'https://httpbin.org/ip'
]
from damai_proxy_config import get_proxy_config
for i in range(2):
try:
proxy_config = get_proxy_config(i)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f"\n📊 验证代理 {i+1}: {proxy_config['server']}")
print("-" * 50)
async with async_playwright() as p:
launch_kwargs = {"headless": True, "proxy": {"server": proxy_url}}
browser = await p.chromium.launch(**launch_kwargs)
context = await browser.new_context()
page = await context.new_page()
for service in ip_services:
try:
print(f" 正在测试: {service}")
await page.goto(service, wait_until='networkidle', timeout=10000)
content = await page.content()
# 尝试解析IP
import re
import json
json_match = re.search(r'\{.*\}', content, re.DOTALL)
if json_match:
try:
data = json.loads(json_match.group())
ip = data.get('origin') or data.get('ip') or 'Unknown'
print(f"{service}: {ip}")
except:
print(f"{service}: JSON解析失败")
else:
print(f"{service}: 未找到JSON数据")
except Exception as e:
print(f"{service}: {str(e)}")
await browser.close()
except Exception as e:
print(f"❌ 代理{i+1}高级验证失败: {str(e)}")
def show_proxy_format():
"""显示代理格式"""
print("="*60)
print("🔧 Playwright代理格式参考")
print("="*60)
from damai_proxy_config import get_proxy_config
for i in range(2):
proxy_config = get_proxy_config(i)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f"\n代理{i+1}:")
print(f" 原始地址: {proxy_config['server']}")
print(f" 用户名: {proxy_config['username']}")
print(f" 密码: {proxy_config['password']}")
print(f" Playwright格式: {proxy_url}")
print(f" 使用示例:")
print(f" browser = await playwright.chromium.launch(")
print(f" proxy={{'server': '{proxy_url}'}}")
print(f" )")
async def main():
"""主函数"""
# 显示代理格式
show_proxy_format()
print("\n" + "="*60)
# 基础验证
await verify_proxy_usage()
# 高级验证(可选,可能会比较耗时)
user_input = input("\n是否进行高级验证? 这将测试多个IP服务 (y/N): ")
if user_input.lower() == 'y':
await advanced_proxy_verification()
print(f"\n{'='*60}")
print("✅ 验证完成!")
print("="*60)
if __name__ == "__main__":
import sys
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
asyncio.run(main())

File diff suppressed because it is too large Load Diff

View File

@@ -181,9 +181,28 @@ class XHSPublishService:
local_images = []
# OSS域名前缀用于补充不完整的图片路径
oss_prefix = "https://bxmkb-beijing.oss-cn-beijing.aliyuncs.com/Images/"
print(f"\n正在处理 {len(images)} 张图片...", file=sys.stderr)
for i, img in enumerate(images):
# 检查是否需要补充OSS前缀
original_img = img
print(f" [调试] 处理图片 {i+1}: '{img}'", file=sys.stderr)
print(f" [调试] is_url={self.is_url(img)}, isabs={os.path.isabs(img)}", file=sys.stderr)
if not self.is_url(img) and not os.path.isabs(img):
# 不是URL也不是绝对路径检查是否需要补充OSS前缀
print(f" [调试] 不是URL也不是绝对路径", file=sys.stderr)
# 如果路径不包含协议且不以/开头可能是相对OSS路径
if '/' in img and not img.startswith('/'):
# 可能是OSS相对路径补充前缀
img = oss_prefix + img
print(f" ✅ 检测到相对路径补充OSS前缀: {original_img} -> {img}", file=sys.stderr)
else:
print(f" [调试] 不满足补充条件: '/' in img={('/' in img)}, not startswith('/')={not img.startswith('/')}", file=sys.stderr)
if self.is_url(img):
# 网络URL需要下载
try:
@@ -195,9 +214,25 @@ class XHSPublishService:
continue
else:
# 本地路径
if os.path.exists(img):
local_images.append(os.path.abspath(img))
print(f" ✅ 本地图片 [{i + 1}]: {os.path.basename(img)}", file=sys.stderr)
# 先尝试直接使用,如果不存在则尝试相对路径
abs_path = None
# 1. 尝试作为绝对路径
if os.path.isabs(img) and os.path.exists(img):
abs_path = img
# 2. 尝试相对于当前工作目录
elif os.path.exists(img):
abs_path = os.path.abspath(img)
# 3. 尝试相对于 static 目录
elif os.path.exists(os.path.join('static', img)):
abs_path = os.path.abspath(os.path.join('static', img))
# 4. 尝试相对于 ../go_backend/static 目录
elif os.path.exists(os.path.join('..', 'go_backend', 'static', img)):
abs_path = os.path.abspath(os.path.join('..', 'go_backend', 'static', img))
if abs_path:
local_images.append(abs_path)
print(f" ✅ 本地图片 [{i + 1}]: {os.path.basename(abs_path)} ({abs_path})", file=sys.stderr)
else:
print(f" ⚠️ 本地图片不存在: {img}", file=sys.stderr)