This commit is contained in:
sjk
2026-01-07 22:55:12 +08:00
parent cb267e8d5e
commit 4720ab2a15
76 changed files with 3110 additions and 7168 deletions

View File

@@ -1,123 +0,0 @@
# 小红书登录 CLI 工具
## 概述
这是一个可以被 Go 服务直接调用的 Python CLI 工具,用于小红书登录功能。
使用此工具后,不再需要单独启动 Python Web 服务。
## 使用方式
### 1. 发送验证码
```bash
python xhs_cli.py send_code <手机号> [国家区号]
```
示例:
```bash
python xhs_cli.py send_code 13800138000 +86
```
返回 JSON 格式:
```json
{
"success": true,
"message": "验证码发送成功"
}
```
### 2. 登录
```bash
python xhs_cli.py login <手机号> <验证码> [国家区号]
```
示例:
```bash
python xhs_cli.py login 13800138000 123456 +86
```
返回 JSON 格式:
```json
{
"success": true,
"user_info": {...},
"cookies": {...},
"url": "https://www.xiaohongshu.com/"
}
```
### 3. 注入 Cookie (验证登录状态)
```bash
python xhs_cli.py inject_cookies '<cookies_json>'
```
示例:
```bash
python xhs_cli.py inject_cookies '[{"name":"web_session","value":"xxx","domain":".xiaohongshu.com"}]'
```
返回 JSON 格式:
```json
{
"success": true,
"logged_in": true,
"cookies": {...},
"user_info": {...}
}
```
## Go 服务集成
Go 服务已经修改为直接调用 Python CLI 脚本,无需启动 Python Web 服务。
### 修改的文件
1. **backend/xhs_cli.py** (新增)
- 命令行接口工具
2. **go_backend/service/xhs_service.go** (修改)
- 使用 `exec.Command` 调用 Python 脚本
- 不再通过 HTTP 调用 Python 服务
3. **go_backend/service/employee_service.go** (修改)
- 使用 `exec.Command` 调用 Python 脚本
### 优点
- ✅ 只需启动一个 Go 服务
- ✅ 部署更简单,不需要管理多个服务进程
- ✅ 减少网络开销
- ✅ 更容易调试和维护
## 依赖要求
确保已安装 Python 依赖:
```bash
cd backend
pip install -r requirements.txt
```
主要依赖:
- playwright
- asyncio
## 注意事项
1. Python 命令需要在系统 PATH 中可用
2. 确保 `xhs_login.py``xhs_cli.py` 在同一目录
3. Go 服务会在相对路径 `../backend` 下查找 Python 脚本
4. 所有输出均为 JSON 格式,便于 Go 服务解析
## 错误处理
如果执行失败,会返回包含错误信息的 JSON:
```json
{
"success": false,
"error": "错误描述信息"
}
```
Go 服务会捕获 stderr 输出并作为错误信息的一部分返回。

View File

@@ -1,313 +0,0 @@
# 小红书笔记发布脚本使用说明
## 功能介绍
`xhs_publish.py` 是一个用于自动发布小红书笔记的 Python 脚本,支持通过 Cookie 认证,自动完成图文笔记发布。
## 环境准备
### 1. 安装依赖
```bash
cd backend
pip install -r requirements.txt
```
主要依赖:
- playwright (浏览器自动化)
- asyncio (异步处理)
### 2. 安装浏览器驱动
```bash
playwright install chromium
```
## 使用方式
### 方式一:使用配置文件(推荐)
#### 1. 准备配置文件
复制 `publish_config_example.json` 并修改为实际参数:
```json
{
"cookies": [
{
"name": "a1",
"value": "your_cookie_value_here",
"domain": ".xiaohongshu.com",
"path": "/",
"expires": -1,
"httpOnly": false,
"secure": false,
"sameSite": "Lax"
}
],
"title": "笔记标题",
"content": "笔记内容",
"images": [
"D:/path/to/image1.jpg",
"D:/path/to/image2.jpg"
],
"tags": [
"标签1",
"标签2"
]
}
```
#### 2. 执行发布
```bash
python xhs_publish.py --config publish_config.json
```
### 方式二:命令行参数
```bash
python xhs_publish.py \
--cookies '[{"name":"a1","value":"xxx","domain":".xiaohongshu.com"}]' \
--title "笔记标题" \
--content "笔记内容" \
--images '["D:/image1.jpg","D:/image2.jpg"]' \
--tags '["标签1","标签2"]'
```
## 参数说明
### cookies (必需)
Cookie 数组,每个 Cookie 对象包含以下字段:
- `name`: Cookie 名称
- `value`: Cookie 值
- `domain`: 域名(通常为 `.xiaohongshu.com`
- `path`: 路径(通常为 `/`
- `expires`: 过期时间(-1 表示会话 Cookie
- `httpOnly`: 是否仅 HTTP
- `secure`: 是否安全
- `sameSite`: 同站策略Lax/Strict/None
**重要 Cookie必需**
- `a1`: 用户身份认证
- `webId`: 设备标识
- `web_session`: 会话信息
### title (必需)
笔记标题,字符串类型。
**示例:**
```
"💧夏日必备2元一杯的柠檬水竟然这么好喝"
```
### content (必需)
笔记正文内容,字符串类型,支持换行符 `\n`
**示例:**
```
"今天给大家分享一个超级实惠的夏日饮品!\n\n蜜雪冰城的柠檬水只要2元一杯性价比真的太高了"
```
### images (可选)
图片文件路径数组,支持本地绝对路径。
**要求:**
- 图片必须是本地文件
- 支持 jpg、png、gif 等格式
- 最多上传 9 张图片
- 建议尺寸800x600 或更高
**示例:**
```json
[
"D:/project/Work/ai_xhs/backend/temp_uploads/image1.jpg",
"D:/project/Work/ai_xhs/backend/temp_uploads/image2.jpg"
]
```
### tags (可选)
标签数组,会自动添加 `#` 前缀。
**示例:**
```json
["夏日清爽", "饮品", "柠檬水"]
```
## 获取 Cookie 的方法
### 方法一:使用登录脚本
```bash
python xhs_cli.py login <手机号> <验证码>
```
登录成功后会自动保存 Cookie 到 `cookies.json` 文件。
### 方法二:浏览器手动获取
1. 在浏览器中登录小红书网页版
2. 打开开发者工具F12
3. 切换到 Network网络标签
4. 刷新页面
5. 找到任意请求,查看 Request Headers
6. 复制 Cookie 字段内容
7. 使用在线工具或脚本转换为 JSON 格式
### 方法三:使用 Cookie 注入验证
```bash
python xhs_cli.py inject_cookies '<cookies_json>'
```
## 返回结果
### 成功示例
```json
{
"success": true,
"message": "笔记发布成功",
"url": "https://www.xiaohongshu.com/explore/xxxx"
}
```
### 失败示例
```json
{
"success": false,
"error": "Cookie已失效或未登录"
}
```
## 注意事项
### 1. Cookie 有效期
- Cookie 会在一段时间后失效
- 需要定期重新登录获取新 Cookie
- 建议使用 Cookie 注入验证接口检查状态
### 2. 图片上传
- 确保图片文件存在且可访问
- 图片路径使用绝对路径
- Windows 系统路径使用 `/``\\` 分隔符
### 3. 发布限制
- 小红书可能有发布频率限制
- 建议控制发布间隔,避免被限流
- 内容需符合小红书社区规范
### 4. 错误处理
常见错误及解决方法:
- **"Cookie已失效"**: 重新登录获取新 Cookie
- **"图片文件不存在"**: 检查图片路径是否正确
- **"未找到发布按钮"**: 小红书页面结构可能变化,需要更新选择器
- **"输入内容失败"**: 等待时间不足,增加延迟时间
## 与 Go 后端集成
在 Go 后端中调用此脚本:
```go
import (
"os/exec"
"encoding/json"
)
// 发布笔记
func PublishNote(cookies []Cookie, title, content string, images, tags []string) error {
// 构造配置文件
config := map[string]interface{}{
"cookies": cookies,
"title": title,
"content": content,
"images": images,
"tags": tags,
}
// 保存到临时文件
configFile := "temp_publish_config.json"
data, _ := json.Marshal(config)
ioutil.WriteFile(configFile, data, 0644)
// 调用 Python 脚本
cmd := exec.Command("python", "backend/xhs_publish.py", "--config", configFile)
output, err := cmd.CombinedOutput()
if err != nil {
return err
}
// 解析结果
var result map[string]interface{}
json.Unmarshal(output, &result)
if !result["success"].(bool) {
return errors.New(result["error"].(string))
}
return nil
}
```
## 开发调试
### 启用浏览器可视模式
修改 `xhs_login.py` 中的 `headless` 参数:
```python
self.browser = await self.playwright.chromium.launch(
headless=False, # 改为 False 可以看到浏览器操作过程
args=['--disable-blink-features=AutomationControlled']
)
```
### 查看详细日志
脚本会在控制台输出详细的执行日志,包括:
- 浏览器初始化
- 登录状态验证
- 图片上传进度
- 内容输入状态
- 发布结果
## 常见问题
### Q: 为什么上传图片后没有显示?
A: 可能是图片上传时间较长,脚本已经增加了等待时间。如果仍有问题,可以调整 `xhs_login.py` 中的等待时间。
### Q: 如何批量发布多条笔记?
A: 准备多个配置文件,使用循环调用脚本:
```bash
for config in publish_config_*.json; do
python xhs_publish.py --config "$config"
sleep 60 # 间隔60秒
done
```
### Q: Cookie 多久失效?
A: 小红书 Cookie 通常在 7-30 天后失效,具体取决于 Cookie 的过期时间设置。
## 技术支持
如有问题,请查看:
1. 脚本执行日志
2. 小红书页面结构是否变化
3. Cookie 是否有效
4. 图片文件是否存在

View File

@@ -12,13 +12,13 @@ import sys
class BrowserPool:
"""浏览器池管理器(单例模式)"""
def __init__(self, idle_timeout: int = 1800, max_instances: int = 5, headless: bool = True):
def __init__(self, idle_timeout: int = 1800, max_instances: int = 20, headless: bool = True):
"""
初始化浏览器池
Args:
idle_timeout: 空闲超时时间默认30分钟已禁用保持常驻
max_instances: 最大浏览器实例数,默认5个
max_instances: 最大浏览器实例数,默认20个支持更多并发
headless: 是否使用无头模式False为有头模式方便调试
"""
self.playwright = None
@@ -37,20 +37,29 @@ class BrowserPool:
self.temp_browsers: Dict[str, Dict] = {} # {session_id: {browser, context, page, created_at}}
self.temp_lock = asyncio.Lock()
# 请求队列当超过max_instances时排队等待
self.waiting_queue: asyncio.Queue = asyncio.Queue()
self.queue_processing = False
# 扫码登录专用页面隔离池共享浏览器和context但每个用户独立page
self.qrcode_pages: Dict[str, Dict] = {} # {session_id: {page, created_at}}
self.qrcode_lock = asyncio.Lock()
print(f"[浏览器池] 已创建,常驻模式(不自动清理),最大实例数: {max_instances}", file=sys.stderr)
async def get_browser(self, cookies: Optional[list] = None, proxy: Optional[str] = None,
async def get_browser(self, cookies: Optional[list] = None, proxy: Optional[dict] = None,
user_agent: Optional[str] = None, session_id: Optional[str] = None,
headless: Optional[bool] = None) -> tuple[Browser, BrowserContext, Page]:
headless: Optional[bool] = None, force_new: bool = False) -> tuple[Browser, BrowserContext, Page]:
"""
获取浏览器实例(复用或新建)
Args:
cookies: 可选的Cookie列表
proxy: 可选的代理地址
proxy: 可选的代理配置,格式: {"server": "...", "username": "...", "password": "..."}
user_agent: 可选的自定义User-Agent
session_id: 会话 ID用于区分不同的并发请求
headless: 可选的headless模式为None时使用默认配置
force_new: 是否强制创建全新浏览器即使session_id已存在
Returns:
(browser, context, page) 三元组
@@ -83,16 +92,37 @@ class BrowserPool:
else:
async with self.temp_lock:
# 首先检查是否已存在该session_id的临时浏览器
if session_id in self.temp_browsers:
if session_id in self.temp_browsers and not force_new:
print(f"[浏览器池] 复用会话 {session_id} 的临时浏览器", file=sys.stderr)
browser_info = self.temp_browsers[session_id]
return browser_info["browser"], browser_info["context"], browser_info["page"]
# 强制创建全新浏览器:先释放旧的
if force_new and session_id in self.temp_browsers:
print(f"[浏览器池] force_new=True释放旧的会话 {session_id}", file=sys.stderr)
old_browser_info = self.temp_browsers[session_id]
try:
await old_browser_info["page"].close()
await old_browser_info["context"].close()
await old_browser_info["browser"].close()
except Exception as e:
print(f"[浏览器池] 释放旧浏览器失败: {str(e)}", file=sys.stderr)
finally:
del self.temp_browsers[session_id]
# 检查是否超过最大实例数
if len(self.temp_browsers) >= self.max_instances - 1: # -1 留给主浏览器
print(f"[浏览器池] ⚠️ 已达最大实例数 ({self.max_instances}),等待释放...", file=sys.stderr)
# TODO: 可以实现等待队列,这里直接报错
raise Exception(f"浏览器实例数已满,请稍后再试")
# 等待最多30秒每秒1秒检查一次
for i in range(30):
await asyncio.sleep(1)
if len(self.temp_browsers) < self.max_instances - 1:
print(f"[浏览器池] 检测到空闲实例,继续创建", file=sys.stderr)
break
else:
# 超时30秒仍满返回错误
raise Exception(f"浏览器实例数已满,请稍后再试")
print(f"[浏览器池] 为会话 {session_id} 创建临时浏览器 ({len(self.temp_browsers)+1}/{self.max_instances-1})", file=sys.stderr)
@@ -131,9 +161,9 @@ class BrowserPool:
await self.close()
return False
async def _init_browser(self, cookies: Optional[list] = None, proxy: Optional[str] = None,
async def _init_browser(self, cookies: Optional[list] = None, proxy: Optional[dict] = None,
user_agent: Optional[str] = None):
"""初始化新浏览器实例"""
"""初始化新浏览器实例。proxy为dict格式: {"server": "...", "username": "...", "password": "..."}"""
try:
# 启动Playwright
if not self.playwright:
@@ -202,7 +232,7 @@ class BrowserPool:
],
}
if proxy:
launch_kwargs["proxy"] = {"server": proxy}
launch_kwargs["proxy"] = proxy # proxy已经是dict格式直接使用
self.browser = await self.playwright.chromium.launch(**launch_kwargs)
print("[浏览器池] Chromium浏览器启动成功", file=sys.stderr)
@@ -215,9 +245,9 @@ class BrowserPool:
await self.close()
raise
async def _create_new_context(self, cookies: Optional[list] = None, proxy: Optional[str] = None,
async def _create_new_context(self, cookies: Optional[list] = None, proxy: Optional[dict] = None,
user_agent: Optional[str] = None):
"""创建新的浏览器上下文"""
"""创建新的浏览器上下文。proxy为dict格式: {"server": "...", "username": "...", "password": "..."}"""
try:
# 关闭旧上下文
if self.context:
@@ -231,6 +261,62 @@ class BrowserPool:
}
self.context = await self.browser.new_context(**context_kwargs)
# 注入反检测脚本(关键)
await self.context.add_init_script("""
// 移除webdriver标记
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
});
// 隐藏chrome自动化特征
window.chrome = {
runtime: {}
};
// 模拟plugins
Object.defineProperty(navigator, 'plugins', {
get: () => [
{
0: {type: "application/x-google-chrome-pdf", suffixes: "pdf", description: "Portable Document Format"},
description: "Portable Document Format",
filename: "internal-pdf-viewer",
length: 1,
name: "Chrome PDF Plugin"
},
{
0: {type: "application/pdf", suffixes: "pdf", description: ""},
description: "",
filename: "mhjfbmdgcfjbbpaeojofohoefgiehjai",
length: 1,
name: "Chrome PDF Viewer"
}
],
});
// 模拟permissions API
const originalQuery = window.navigator.permissions.query;
window.navigator.permissions.query = (parameters) => (
parameters.name === 'notifications' ?
Promise.resolve({ state: Notification.permission }) :
originalQuery(parameters)
);
// 阻止检测自动化的网络请求
const originalFetch = window.fetch;
window.fetch = function(...args) {
const url = args[0];
if (typeof url === 'string' && (
url.includes('127.0.0.1:9222') ||
url.includes('localhost:9222') ||
url.includes('chrome-extension://invalid')
)) {
return Promise.reject(new Error('blocked'));
}
return originalFetch.apply(this, args);
};
""")
print("[浏览器池] 已注入反检测脚本", file=sys.stderr)
# 注入Cookie
if cookies:
await self.context.add_cookies(cookies)
@@ -402,13 +488,13 @@ class BrowserPool:
except:
pass
async def _create_temp_browser(self, cookies: Optional[list] = None, proxy: Optional[str] = None,
async def _create_temp_browser(self, cookies: Optional[list] = None, proxy: Optional[dict] = None,
user_agent: Optional[str] = None, headless: bool = True) -> tuple[Browser, BrowserContext, Page]:
"""创建临时浏览器实例(用于并发请求)
Args:
cookies: Cookie列表
proxy: 代理地址
proxy: 代理配置,格式: {"server": "...", "username": "...", "password": "..."}
user_agent: 自定义User-Agent
headless: 是否使用无头模式
"""
@@ -425,14 +511,14 @@ class BrowserPool:
# 启动浏览器(临时实例,性能优先配置)
launch_kwargs = {
"headless": headless, # 使用传入的headless参数
"headless": headless,
"args": [
'--disable-blink-features=AutomationControlled',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
# 性能优化
# 性能优化 - 减少资源占用
'--disable-web-security',
'--disable-features=IsolateOrigins,site-per-process',
'--disable-site-isolation-trials',
@@ -442,20 +528,19 @@ class BrowserPool:
'--disable-renderer-backgrounding',
'--disable-background-networking',
# 缓存优化
'--disk-cache-size=268435456',
'--media-cache-size=134217728',
# 缓存优化 - 减小缓存以节省内存
'--disk-cache-size=67108864', # 64MB原256MB
'--media-cache-size=33554432', # 32MB原128MB
# 渲染优化
'--enable-gpu-rasterization',
'--enable-zero-copy',
'--ignore-gpu-blocklist',
'--enable-accelerated-2d-canvas',
# 渲染优化 - 禁用GPU以减少资源占用
'--disable-gpu',
'--disable-accelerated-2d-canvas',
'--disable-accelerated-video-decode',
# 网络优化
'--enable-quic',
'--enable-tcp-fast-open',
'--max-connections-per-host=10',
'--max-connections-per-host=6', # 减少连接数原10
# 减少不必要的功能
'--disable-extensions',
@@ -466,6 +551,9 @@ class BrowserPool:
'--disable-prompt-on-repost',
'--disable-domain-reliability',
'--disable-component-update',
'--disable-plugins',
'--disable-sync',
'--disable-translate',
# 界面优化
'--hide-scrollbars',
@@ -473,21 +561,82 @@ class BrowserPool:
'--no-first-run',
'--no-default-browser-check',
'--metrics-recording-only',
'--force-color-profile=srgb',
# 内存优化
'--js-flags=--max-old-space-size=512', # 限制JS堆内存
],
}
if proxy:
launch_kwargs["proxy"] = {"server": proxy}
launch_kwargs["proxy"] = proxy # proxy已经是dict格式直接使用
browser = await self.playwright.chromium.launch(**launch_kwargs)
# 创建上下文
# 创建上下文(使用隐身模式,确保无痕迹)
context_kwargs = {
"viewport": {'width': 1280, 'height': 720},
"user_agent": user_agent or 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
"no_viewport": False,
"ignore_https_errors": True,
# 不使用storage_state确保完全干净
}
context = await browser.new_context(**context_kwargs)
# 注入反检测脚本(关键)
await context.add_init_script("""
// 移除webdriver标记
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
});
// 隐藏chrome自动化特征
window.chrome = {
runtime: {}
};
// 模拟plugins
Object.defineProperty(navigator, 'plugins', {
get: () => [
{
0: {type: "application/x-google-chrome-pdf", suffixes: "pdf", description: "Portable Document Format"},
description: "Portable Document Format",
filename: "internal-pdf-viewer",
length: 1,
name: "Chrome PDF Plugin"
},
{
0: {type: "application/pdf", suffixes: "pdf", description: ""},
description: "",
filename: "mhjfbmdgcfjbbpaeojofohoefgiehjai",
length: 1,
name: "Chrome PDF Viewer"
}
],
});
// 模拟permissions API
const originalQuery = window.navigator.permissions.query;
window.navigator.permissions.query = (parameters) => (
parameters.name === 'notifications' ?
Promise.resolve({ state: Notification.permission }) :
originalQuery(parameters)
);
// 阻止检测自动化的网络请求
const originalFetch = window.fetch;
window.fetch = function(...args) {
const url = args[0];
if (typeof url === 'string' && (
url.includes('127.0.0.1:9222') ||
url.includes('localhost:9222') ||
url.includes('chrome-extension://invalid')
)) {
return Promise.reject(new Error('blocked'));
}
return originalFetch.apply(this, args);
};
""")
print("[临时浏览器] 已注入反检测脚本", file=sys.stderr)
# 注入Cookie
if cookies:
await context.add_cookies(cookies)
@@ -516,6 +665,54 @@ class BrowserPool:
finally:
del self.temp_browsers[session_id]
async def get_qrcode_page(self, session_id: str) -> Page:
"""
为扫码登录获取页面(页面隔离模式)
多个用户共享同一个浏览器实例但每个用户有独立的page
这样可以大大减少浏览器崩溃风险
Args:
session_id: 会话 ID
Returns:
Page 对象
"""
async with self.qrcode_lock:
# 复用已有的page
if session_id in self.qrcode_pages:
print(f"[扫码页面池] 复用会话 {session_id} 的页面", file=sys.stderr)
return self.qrcode_pages[session_id]["page"]
# 确保主浏览器已初始化
async with self.init_lock:
if not await self._is_browser_alive():
print("[扫码页面池] 主浏览器未初始化,创建中...", file=sys.stderr)
await self._init_browser()
# 从主context创建新page
print(f"[扫码页面池] 为会话 {session_id} 创建新页面 ({len(self.qrcode_pages)+1} 个活跃页面)", file=sys.stderr)
page = await self.context.new_page()
self.qrcode_pages[session_id] = {
"page": page,
"created_at": time.time()
}
return page
async def release_qrcode_page(self, session_id: str):
"""释放扫码登录页面"""
async with self.qrcode_lock:
if session_id in self.qrcode_pages:
page_info = self.qrcode_pages[session_id]
try:
await page_info["page"].close()
print(f"[扫码页面池] 已释放会话 {session_id} 的页面", file=sys.stderr)
except Exception as e:
print(f"[扫码页面池] 释放页面异常: {str(e)}", file=sys.stderr)
finally:
del self.qrcode_pages[session_id]
def get_stats(self) -> Dict[str, Any]:
"""获取浏览器池统计信息"""
return {
@@ -524,6 +721,7 @@ class BrowserPool:
"page_alive": self.page is not None,
"is_preheated": self.is_preheated,
"temp_browsers_count": len(self.temp_browsers),
"qrcode_pages_count": len(self.qrcode_pages),
"max_instances": self.max_instances,
"last_used_time": self.last_used_time,
"idle_seconds": int(time.time() - self.last_used_time) if self.last_used_time > 0 else 0,

View File

@@ -32,7 +32,7 @@ login:
# ========== 定时发布调度器配置 ==========
scheduler:
enabled: true # 是否启用定时任务
enabled: false # 是否启用定时任务
cron: "*/5 * * * * *" # Cron表达式(秒 分 时 日 月 周) - 每5秒执行一次(开发环境测试)
max_concurrent: 2 # 最大并发发布数
publish_timeout: 300 # 发布超时时间(秒)

View File

@@ -28,7 +28,7 @@ browser_pool:
# ========== 登录/绑定功能配置 ==========
login:
headless: false # 登录/绑定时的浏览器模式: false=有头模式(配合Xvfb避免被检测)true=无头模式
page: "home" # 登录页面类型: creator=创作者中心(creator.xiaohongshu.com/login), home=小红书首页(www.xiaohongshu.com)
page: "creator" # 登录页面类型: creator=创作者中心(creator.xiaohongshu.com/login), home=小红书首页(www.xiaohongshu.com)
# ========== 定时发布调度器配置 ==========
scheduler:
@@ -49,7 +49,7 @@ scheduler:
# ========== 代理池配置 ==========
proxy_pool:
enabled: false # 默认关闭,按需开启
enabled: true # 启用代理池避免IP被风控
api_url: "http://api.tianqiip.com/getip?secret=lu29e593&num=1&type=txt&port=1&mr=1&sign=4b81a62eaed89ba802a8f34053e2c964"
# ========== 阿里云短信配置 ==========

View File

@@ -9,21 +9,19 @@
import requests
proxy_ip = "36.137.177.131:50001";
proxy_ip = "210.51.27.194:50001";
# 用户名密码认证(私密代理/独享代理)
username = "qqwvy0"
password = "mun3r7xz"
username = "hb6su3"
password = "acv2ciow"
proxies = {
"http": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": proxy_ip},
"https": "http://%(user)s:%(pwd)s@%(proxy)s/" % {"user": username, "pwd": password, "proxy": proxy_ip}
}
print(proxies)
# 要访问的目标网页
target_url = "https://creator.xiaohongshu.com/login";
target_url = "https://www.xiaohongshu.com/explore";
# 使用代理IP发送请求
response = requests.get(target_url, proxies=proxies)

View File

@@ -1,23 +1,62 @@
"""
大麦固定代理IP配置
用于在无头浏览器中使用固定代理IP
小红书代理IP配置
用于在无头浏览器中使用代理IP防止风控
"""
# 大麦固定代理IP池
DAMAI_PROXY_POOL = [
# 代理IP池配置
PROXY_POOL = [
{
"name": "大麦代理1",
"name": "代理01",
"server": "http://60.188.239.186:3101", # 如果支持SOCKS5改为 socks5://...
"username": "46vTEIvZt",
"password": "gM33AFND",
"enabled": False # HTTP代理不支持HTTPS隧道暂时禁用
},
{
"name": "代理02",
"server": "http://222.94.104.232:4201",
"username": "46azrCOcF",
"password": "WKyKYE6P",
"enabled": False # HTTP代理不支持HTTPS隧道暂时禁用
},
{
"name": "代理03",
"server": "http://125.94.108.2:4601",
"username": "46eX9tk99",
"password": "odtvKjpl",
"enabled": False # HTTP代理不支持HTTPS隧道暂时禁用
},
{
"name": "代理04",
"server": "http://113.24.66.191:3601",
"username": "46r74jRaD",
"password": "WjOXiXjq",
"enabled": False # HTTP代理不支持HTTPS隧道暂时禁用
},
{
"name": "代理05",
"server": "http://113.249.158.23:4401",
"username": "46oKu9Ovb",
"password": "4kWUGkNv",
"enabled": False # HTTP代理不支持HTTPS隧道暂时禁用
}, {
"name": "天启01",
"server": "http://36.137.177.131:50001",
"username": "qqwvy0",
"password": "mun3r7xz",
"enabled": True
},
{
"name": "大麦代理2",
"enabled": False
}, {
"name": "天启02",
"server": "http://111.132.40.72:50002",
"username": "ih3z07",
"password": "078bt7o5",
"enabled": True
"enabled": False
}, {
"name": "天启03",
"server": "http://210.51.27.194:50001",
"username": "hb6su3",
"password": "acv2ciow",
"enabled": False
}
]
@@ -27,18 +66,18 @@ def get_proxy_config(index: int = 0) -> dict:
获取指定索引的代理配置
Args:
index: 代理索引0或1
index: 代理索引0-4
Returns:
代理配置字典包含server、username、password
"""
if index < 0 or index >= len(DAMAI_PROXY_POOL):
raise ValueError(f"代理索引无效: {index},有效范围: 0-{len(DAMAI_PROXY_POOL)-1}")
proxy = DAMAI_PROXY_POOL[index]
if index < 0 or index >= len(PROXY_POOL):
raise ValueError(f"代理索引无效: {index},有效范围: 0-{len(PROXY_POOL) - 1}")
proxy = PROXY_POOL[index]
if not proxy.get("enabled", True):
raise ValueError(f"代理已禁用: {proxy['name']}")
return {
"server": proxy["server"],
"username": proxy["username"],
@@ -60,7 +99,7 @@ def get_all_enabled_proxies() -> list:
"password": p["password"],
"name": p["name"]
}
for p in DAMAI_PROXY_POOL
for p in PROXY_POOL
if p.get("enabled", True)
]
@@ -73,11 +112,11 @@ def get_random_proxy() -> dict:
代理配置字典
"""
import random
enabled_proxies = [p for p in DAMAI_PROXY_POOL if p.get("enabled", True)]
enabled_proxies = [p for p in PROXY_POOL if p.get("enabled", True)]
if not enabled_proxies:
raise ValueError("没有可用的代理")
proxy = random.choice(enabled_proxies)
return {
"server": proxy["server"],
@@ -87,12 +126,69 @@ def get_random_proxy() -> dict:
}
# 快捷访问
def format_proxy_url(proxy_config: dict) -> str:
"""
将代理配置格式化为Playwright可用的代理URL
Args:
proxy_config: 代理配置字典
Returns:
格式化的代理URL: http://username:password@host:port
"""
server = proxy_config['server'].replace('http://', '').replace('https://', '')
username = proxy_config['username']
password = proxy_config['password']
return f"http://{username}:{password}@{server}"
def format_proxy_for_playwright(proxy_config: dict) -> dict:
"""
将代理配置格式化为Playwright的proxy字典格式
Args:
proxy_config: 代理配置字典
Returns:
Playwright proxy配置: {"server": "...", "username": "...", "password": "..."}
"""
return {
"server": proxy_config['server'],
"username": proxy_config['username'],
"password": proxy_config['password']
}
# 快捷访问函数(保持向后兼容)
def get_proxy_1():
"""获取代理1配置"""
"""获取代理01配置"""
return get_proxy_config(0)
def get_proxy_2():
"""获取代理2配置"""
"""获取代理02配置"""
return get_proxy_config(1)
def get_proxy_3():
"""获取代理03配置"""
return get_proxy_config(2)
def get_proxy_4():
"""获取代理04配置"""
return get_proxy_config(3)
def get_proxy_5():
"""获取代理05配置"""
return get_proxy_config(4)
def get_proxy_6():
"""获取代理06配置"""
return get_proxy_config(5)
def get_proxy_7():
"""获取代理07配置"""
return get_proxy_config(6)

View File

@@ -38,6 +38,20 @@ async def save_error_screenshot(
return None
try:
# 检查页面状态
try:
current_url = page.url
print(f"[错误截图] 当前URL: {current_url}", file=sys.stderr)
# 检查是否是空白页
if current_url in ['about:blank', '', 'data:,']:
print(f"[错误截图] 警告: 当前页面为空白页,截图可能没有内容", file=sys.stderr)
# 等待页面稳定
await page.wait_for_load_state('domcontentloaded', timeout=3000)
except Exception as state_error:
print(f"[错误截图] 检查页面状态失败: {str(state_error)}", file=sys.stderr)
# 生成文件名年月日时分秒_错误类型.png
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
@@ -52,11 +66,21 @@ async def save_error_screenshot(
filepath = SCREENSHOT_DIR / filename
# 截图
await page.screenshot(path=str(filepath), full_page=True)
# 截图(添加超时和全页截图)
await page.screenshot(
path=str(filepath),
full_page=True,
timeout=10000 # 10秒超时
)
# 检查截图文件大小
file_size = filepath.stat().st_size
print(f"[错误截图] 已保存: {filepath} (大小: {file_size} bytes)", file=sys.stderr)
# 如果文件太小小于5KB可能是空白截图
if file_size < 5120:
print(f"[错误截图] 警告: 截图文件过小 ({file_size} bytes),可能为空白页面", file=sys.stderr)
# 打印日志
print(f"[错误截图] 已保存: {filepath}", file=sys.stderr)
if error_message:
print(f"[错误截图] 错误信息: {error_message}", file=sys.stderr)
@@ -115,6 +139,19 @@ async def save_screenshot_with_html(
return None, None
try:
# 检查页面状态
try:
current_url = page.url
print(f"[错误截图] 当前URL: {current_url}", file=sys.stderr)
if current_url in ['about:blank', '', 'data:,']:
print(f"[错误截图] 警告: 当前页面为空白页", file=sys.stderr)
# 等待页面稳定
await page.wait_for_load_state('domcontentloaded', timeout=3000)
except Exception as state_error:
print(f"[错误截图] 检查页面状态失败: {str(state_error)}", file=sys.stderr)
# 生成文件名
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
safe_error_type = "".join(c for c in error_type if c.isalnum() or c in ('_', '-'))
@@ -126,7 +163,16 @@ async def save_screenshot_with_html(
# 保存截图
screenshot_path = SCREENSHOT_DIR / f"{base_filename}.png"
await page.screenshot(path=str(screenshot_path), full_page=True)
await page.screenshot(
path=str(screenshot_path),
full_page=True,
timeout=10000
)
# 检查截图文件大小
screenshot_size = screenshot_path.stat().st_size
if screenshot_size < 5120:
print(f"[错误截图] 警告: 截图文件过小 ({screenshot_size} bytes)", file=sys.stderr)
# 保存HTML
html_path = SCREENSHOT_DIR / f"{base_filename}.html"
@@ -134,8 +180,9 @@ async def save_screenshot_with_html(
with open(html_path, 'w', encoding='utf-8') as f:
f.write(html_content)
print(f"[错误截图] 已保存截图: {screenshot_path}", file=sys.stderr)
print(f"[错误截图] 已保存HTML: {html_path}", file=sys.stderr)
html_size = html_path.stat().st_size
print(f"[错误截图] 已保存截图: {screenshot_path} ({screenshot_size} bytes)", file=sys.stderr)
print(f"[错误截图] 已保存HTML: {html_path} ({html_size} bytes)", file=sys.stderr)
if error_message:
print(f"[错误截图] 错误信息: {error_message}", file=sys.stderr)

View File

@@ -288,9 +288,10 @@ async def send_code(request: SendCodeRequest):
支持选择从创作者中心或小红书首页登录
并发支持:为每个请求分配独立的浏览器实例
"""
# 使用手机号作为session_id确保发送验证码和登录验证使用同一个浏览器
session_id = f"xhs_login_{request.phone}"
print(f"[发送验证码] session_id={session_id}, phone={request.phone}", file=sys.stderr)
# 使用随机UUID作为session_id确保每次都创建全新浏览器,完全不复用
import uuid
session_id = f"xhs_login_{uuid.uuid4().hex}"
print(f"[发送验证码] 创建全新浏览器实例 session_id={session_id}, phone={request.phone}", file=sys.stderr)
# 获取配置中的默认login_page如果API传入了则优先使用API参数
config = get_config()
@@ -315,6 +316,14 @@ async def send_code(request: SendCodeRequest):
)
if result["success"]:
# 验证浏览器是否已保存到池中
if browser_pool and session_id in browser_pool.temp_browsers:
print(f"[发送验证码] ✅ 浏览器实例已保存到池中: {session_id}", file=sys.stderr)
print(f"[发送验证码] 当前池中共有 {len(browser_pool.temp_browsers)} 个临时浏览器", file=sys.stderr)
else:
print(f"[发送验证码] ⚠️ 浏览器实例未保存到池中: {session_id}", file=sys.stderr)
print(f"[发送验证码] 池中的session列表: {list(browser_pool.temp_browsers.keys()) if browser_pool else 'None'}", file=sys.stderr)
return BaseResponse(
code=0,
message="验证码已发送请在小红书APP中查看",
@@ -421,6 +430,346 @@ async def verify_phone_code(request: VerifyCodeRequest):
data=None
)
@app.post("/api/xhs/qrcode/start", response_model=BaseResponse)
async def start_qrcode_login():
"""
启动小红书扫码登录,返回二维码图片和状态
每个用户必须使用独立的浏览器实例不能共享Context
"""
try:
print("[扫码登录] 启动扫码登录流程", file=sys.stderr)
# 使用随机UUID创建临时的登录服务实例完全不复用
import uuid
session_id = f"qrcode_login_{uuid.uuid4().hex}"
print(f"[扫码登录] 创建全新浏览器实例 session_id={session_id}", file=sys.stderr)
qrcode_service = XHSLoginService(
use_pool=True,
headless=login_service.headless,
session_id=session_id,
use_page_isolation=False # 小红书不支持页面隔离,必须独立浏览器
)
# 初始化浏览器
await qrcode_service.init_browser()
# 启动扫码登录
result = await qrcode_service.start_qrcode_login()
if result["success"]:
return BaseResponse(
code=0,
message="二维码获取成功",
data={
"session_id": session_id,
"qrcode_image": result["qrcode_image"],
"status_text": result.get("status_text", ""),
"status_desc": result.get("status_desc", ""),
"is_expired": result.get("is_expired", False),
# 添加二维码创建信息
"qr_url": result.get("qr_url", ""),
"qr_id": result.get("qr_id", ""),
"qr_code": result.get("qr_code", ""),
"multi_flag": result.get("multi_flag", 0)
}
)
else:
# 失败后释放临时浏览器
if browser_pool and session_id:
try:
await browser_pool.release_temp_browser(session_id)
print(f"[扫码登录] 已释放失败的session: {session_id}", file=sys.stderr)
except Exception as release_error:
print(f"[扫码登录] 释放浏览器失败: {str(release_error)}", file=sys.stderr)
return BaseResponse(
code=1,
message=result.get("error", "获取二维码失败"),
data=None
)
except Exception as e:
print(f"[扫码登录] 异常: {str(e)}", file=sys.stderr)
# 异常后释放临时浏览器
if browser_pool and 'session_id' in locals():
try:
await browser_pool.release_temp_browser(session_id)
print(f"[扫码登录] 已释放异常的session: {session_id}", file=sys.stderr)
except Exception as release_error:
print(f"[扫码登录] 释放浏览器失败: {str(release_error)}", file=sys.stderr)
return BaseResponse(
code=1,
message=f"启动扫码登录失败: {str(e)}",
data=None
)
@app.post("/api/xhs/qrcode/status")
async def get_qrcode_status(request: dict):
"""
轮询获取扫码状态和最新的二维码图片
"""
try:
session_id = request.get('session_id')
if not session_id:
return BaseResponse(
code=1,
message="session_id不能为空",
data=None
)
# 检查session是否存在于浏览器池中
if browser_pool and session_id not in browser_pool.temp_browsers:
print(f"[扫码状态] session_id={session_id} 已失效,要求重新创建二维码", file=sys.stderr)
return BaseResponse(
code=2, # 特殊错误码表示session失效
message="会话已失效,请刷新二维码重新开始",
data={
"session_expired": True
}
)
# 使用session_id获取浏览器实例
qrcode_service = XHSLoginService(
use_pool=True,
headless=login_service.headless,
session_id=session_id
)
# 初始化浏览器(会复用已有的)
await qrcode_service.init_browser()
# 提取当前二维码状态
result = await qrcode_service.extract_qrcode_with_status()
if result["success"]:
# 如果登录成功,返回登录信息
if result.get("login_success"):
return BaseResponse(
code=0,
message="扫码登录成功",
data={
"login_success": True,
"user_info": result.get("user_info"),
"cookies": result.get("cookies"),
"cookies_full": result.get("cookies_full"),
"login_state": result.get("login_state")
}
)
else:
# 还未登录,返回二维码状态
return BaseResponse(
code=0,
message="获取状态成功",
data={
"login_success": False,
"qrcode_image": result["qrcode_image"],
"status_text": result.get("status_text", ""),
"status_desc": result.get("status_desc", ""),
"is_expired": result.get("is_expired", False)
}
)
else:
return BaseResponse(
code=1,
message=result.get("error", "获取状态失败"),
data=None
)
except Exception as e:
print(f"[扫码状态] 异常: {str(e)}", file=sys.stderr)
import traceback
traceback.print_exc()
return BaseResponse(
code=1,
message=f"获取状态失败: {str(e)}",
data=None
)
@app.post("/api/xhs/qrcode/refresh")
async def refresh_qrcode(request: dict):
"""
刷新过期的二维码
"""
try:
session_id = request.get('session_id')
if not session_id:
return BaseResponse(
code=1,
message="session_id不能为空",
data=None
)
# 使用session_id获取浏览器实例
qrcode_service = XHSLoginService(
use_pool=True,
headless=login_service.headless,
session_id=session_id
)
# 初始化浏览器
await qrcode_service.init_browser()
# 刷新二维码
result = await qrcode_service.refresh_qrcode()
if result["success"]:
return BaseResponse(
code=0,
message="二维码刷新成功",
data={
"qrcode_image": result["qrcode_image"],
"status_text": result.get("status_text", ""),
"status_desc": result.get("status_desc", ""),
"is_expired": result.get("is_expired", False),
# 添加二维码创建信息
"qr_url": result.get("qr_url", ""),
"qr_id": result.get("qr_id", ""),
"qr_code": result.get("qr_code", ""),
"multi_flag": result.get("multi_flag", 0)
}
)
else:
# 检查是否需要重启
if result.get("need_restart"):
return BaseResponse(
code=3, # 特殊错误码,表示需要重启
message="页面已失效,请重新启动扫码登录",
data={
"need_restart": True
}
)
return BaseResponse(
code=1,
message=result.get("error", "刷新失败"),
data=None
)
except Exception as e:
print(f"[刷新二维码] 异常: {str(e)}", file=sys.stderr)
return BaseResponse(
code=1,
message=f"刷新二维码失败: {str(e)}",
data=None
)
@app.post("/api/xhs/save-bind-info")
async def save_bind_info(request: dict):
"""
保存扫码登录的绑定信息到Go后端
与验证码登录不同扫码登录直接返回了完整数据需要由Python转发给Go后端保存
"""
try:
employee_id = request.get('employee_id')
cookies_full = request.get('cookies_full', [])
user_info = request.get('user_info', {})
login_state = request.get('login_state', {})
if not employee_id:
return BaseResponse(
code=1,
message="employee_id不能为空",
data=None
)
# 调用Go后端API保存
config = get_config()
go_backend_url = config.get_str('go_backend.url', 'http://localhost:8080')
# 构造请求数据模仏bind-xhs接口的返回格式
# Go后端期望接收的是验证码登录的结果
save_data = {
"employee_id": employee_id,
"cookies_full": cookies_full,
"user_info": user_info,
"login_state": login_state
}
import aiohttp
async with aiohttp.ClientSession() as session:
# 获取小程序传来的token
auth_header = request.get('Authorization', '')
async with session.post(
f"{go_backend_url}/api/xhs/save-qrcode-login",
json=save_data,
headers={'Authorization': auth_header} if auth_header else {}
) as resp:
result = await resp.json()
if resp.status == 200 and result.get('code') == 200:
return BaseResponse(
code=0,
message="保存成功",
data=result.get('data')
)
else:
return BaseResponse(
code=1,
message=result.get('message', '保存失败'),
data=None
)
except Exception as e:
print(f"[保存绑定信息] 异常: {str(e)}", file=sys.stderr)
import traceback
traceback.print_exc()
return BaseResponse(
code=1,
message=f"保存失败: {str(e)}",
data=None
)
@app.post("/api/xhs/qrcode/cancel")
async def cancel_qrcode_login(request: dict):
"""
取消扫码登录,释放浏览器资源
用于用户切换登录方式或关闭页面时
"""
try:
session_id = request.get('session_id')
if not session_id:
return BaseResponse(
code=1,
message="session_id不能为空",
data=None
)
# 释放临时浏览器
if browser_pool:
try:
await browser_pool.release_temp_browser(session_id)
print(f"[取消扫码] 已释放 session: {session_id}", file=sys.stderr)
return BaseResponse(
code=0,
message="已取消扫码登录",
data=None
)
except Exception as e:
print(f"[取消扫码] 释放浏览器失败: {str(e)}", file=sys.stderr)
# 即使失败也返回成功,不影响用户体验
return BaseResponse(
code=0,
message="已取消扫码登录",
data=None
)
else:
return BaseResponse(
code=0,
message="浏览器池未初始化",
data=None
)
except Exception as e:
print(f"[取消扫码] 异常: {str(e)}", file=sys.stderr)
return BaseResponse(
code=0, # 即使异常也返回成功
message="已取消扫码登录",
data=None
)
@app.post("/api/xhs/login", response_model=BaseResponse)
async def login(request: LoginRequest):
"""
@@ -429,13 +778,16 @@ async def login(request: LoginRequest):
支持选择从创作者中心或小红书首页登录
并发支持可复用send-code接口的session_id
"""
# 使用手机号作为session_id复用发送验证码时的浏览器
# 如果前端传session_id就使用前端的,否则根据手机号生成
# 必须使用前端传递的session_id复用浏览器
# 如果前端没有传session_id,说明前端实现有问题
if not request.session_id:
session_id = f"xhs_login_{request.phone}"
else:
session_id = request.session_id
return BaseResponse(
code=1,
message="缺少session_id参数无法复用浏览器实例请重新发送验证码",
data=None
)
session_id = request.session_id
print(f"[登录验证] session_id={session_id}, phone={request.phone}", file=sys.stderr)
# 获取配置中的默认login_page如果API传入了则优先使用API参数
@@ -448,7 +800,15 @@ async def login(request: LoginRequest):
try:
# 如果有session_id复用send-code的浏览器否则创建新的
if session_id:
print(f"[登录验证] 复用send-code的浏览器: {session_id}", file=sys.stderr)
print(f"[登录验证] 尝试复用send-code的浏览器: {session_id}", file=sys.stderr)
# 先检查浏览器池中是否存在该session
if browser_pool and session_id in browser_pool.temp_browsers:
print(f"[登录验证] ✅ 在浏览器池中找到session: {session_id}", file=sys.stderr)
else:
print(f"[登录验证] ⚠️ 浏览器池中未找到session: {session_id}", file=sys.stderr)
print(f"[登录验证] 当前池中的session列表: {list(browser_pool.temp_browsers.keys()) if browser_pool else 'None'}", file=sys.stderr)
request_login_service = XHSLoginService(
use_pool=True,
headless=login_service.headless, # 使用配置文件中的login.headless配置
@@ -456,6 +816,12 @@ async def login(request: LoginRequest):
)
# 初始化浏览器,以便从浏览器池获取临时浏览器
await request_login_service.init_browser()
# 再次验证浏览器是否正常初始化
if request_login_service.page:
print(f"[登录验证] ✅ 浏览器初始化成功当前URL: {request_login_service.page.url}", file=sys.stderr)
else:
print(f"[登录验证] ❌ 浏览器初始化失败page为None", file=sys.stderr)
else:
# 旧逻辑不传session_id使用全局登录服务
print(f"[登录验证] 使用全局登录服务(旧逻辑)", file=sys.stderr)

View File

@@ -13,3 +13,4 @@ alibabacloud_dysmsapi20170525==2.0.24
alibabacloud_credentials==0.3.4
alibabacloud_tea_openapi==0.3.9
alibabacloud_tea_util==0.3.13
loguru==0.7.2

View File

@@ -1,188 +0,0 @@
"""
测试 API 返回格式
验证登录 API 是否正确返回 cookies 和 cookies_full
"""
import json
def test_api_response_format():
"""测试 API 响应格式"""
# 模拟 API 返回的数据
mock_response = {
"code": 0,
"message": "登录成功",
"data": {
"user_info": {},
"cookies": {
"a1": "xxx",
"webId": "yyy",
"web_session": "zzz"
},
"cookies_full": [
{
"name": "a1",
"value": "xxx",
"domain": ".xiaohongshu.com",
"path": "/",
"expires": 1797066496,
"httpOnly": False,
"secure": False,
"sameSite": "Lax"
},
{
"name": "webId",
"value": "yyy",
"domain": ".xiaohongshu.com",
"path": "/",
"expires": 1797066496,
"httpOnly": False,
"secure": False,
"sameSite": "Lax"
},
{
"name": "web_session",
"value": "zzz",
"domain": ".xiaohongshu.com",
"path": "/",
"expires": 1797066497,
"httpOnly": True,
"secure": True,
"sameSite": "Lax"
}
],
"login_time": "2025-12-12T23:30:00"
}
}
print("="*60)
print("API 响应格式测试")
print("="*60)
print()
# 检查响应结构
assert "code" in mock_response, "缺少 code 字段"
assert "message" in mock_response, "缺少 message 字段"
assert "data" in mock_response, "缺少 data 字段"
data = mock_response["data"]
# 检查 cookies 字段(键值对格式)
print("✅ 检查 cookies 字段(键值对格式):")
assert "cookies" in data, "缺少 cookies 字段"
assert isinstance(data["cookies"], dict), "cookies 应该是字典类型"
print(f" 类型: {type(data['cookies']).__name__}")
print(f" 示例: {json.dumps(data['cookies'], ensure_ascii=False, indent=2)}")
print()
# 检查 cookies_full 字段Playwright 完整格式)
print("✅ 检查 cookies_full 字段Playwright 完整格式):")
assert "cookies_full" in data, "缺少 cookies_full 字段"
assert isinstance(data["cookies_full"], list), "cookies_full 应该是列表类型"
print(f" 类型: {type(data['cookies_full']).__name__}")
print(f" 数量: {len(data['cookies_full'])} 个 Cookie")
print(f" 示例(第一个):")
print(f"{json.dumps(data['cookies_full'][0], ensure_ascii=False, indent=6)}")
print()
# 检查 cookies_full 的每个元素
print("✅ 检查 cookies_full 的结构:")
for i, cookie in enumerate(data["cookies_full"]):
assert "name" in cookie, f"Cookie[{i}] 缺少 name 字段"
assert "value" in cookie, f"Cookie[{i}] 缺少 value 字段"
assert "domain" in cookie, f"Cookie[{i}] 缺少 domain 字段"
assert "path" in cookie, f"Cookie[{i}] 缺少 path 字段"
assert "expires" in cookie, f"Cookie[{i}] 缺少 expires 字段"
assert "httpOnly" in cookie, f"Cookie[{i}] 缺少 httpOnly 字段"
assert "secure" in cookie, f"Cookie[{i}] 缺少 secure 字段"
assert "sameSite" in cookie, f"Cookie[{i}] 缺少 sameSite 字段"
print(f" Cookie[{i}] ({cookie['name']}): ✅ 所有字段完整")
print()
print("="*60)
print("🎉 所有检查通过API 返回格式正确")
print("="*60)
print()
# 使用场景说明
print("📝 使用场景:")
print()
print("1. 前端展示 - 使用 cookies键值对格式:")
print(" const cookies = response.data.cookies;")
print(" console.log(cookies.a1, cookies.webId);")
print()
print("2. 数据库存储 - 使用 cookies_full完整格式:")
print(" const cookiesFull = response.data.cookies_full;")
print(" await db.saveCookies(userId, JSON.stringify(cookiesFull));")
print()
print("3. Python 脚本使用 - 使用 cookies_full:")
print(" cookies_full = response['data']['cookies_full']")
print(" publisher = XHSPublishService(cookies_full)")
print()
def compare_formats():
"""对比两种格式"""
print("="*60)
print("格式对比分析")
print("="*60)
print()
# 键值对格式
cookies_dict = {
"a1": "xxx",
"webId": "yyy",
"web_session": "zzz"
}
# Playwright 完整格式
cookies_full = [
{
"name": "a1",
"value": "xxx",
"domain": ".xiaohongshu.com",
"path": "/",
"expires": 1797066496,
"httpOnly": False,
"secure": False,
"sameSite": "Lax"
}
]
print("📊 键值对格式:")
dict_str = json.dumps(cookies_dict, ensure_ascii=False, indent=2)
print(dict_str)
print(f" 大小: {len(dict_str)} 字符")
print()
print("📊 Playwright 完整格式:")
full_str = json.dumps(cookies_full, ensure_ascii=False, indent=2)
print(full_str)
print(f" 大小: {len(full_str)} 字符")
print()
print("📊 对比结果:")
print(f" 完整格式 vs 键值对格式: {len(full_str)} / {len(dict_str)} = {len(full_str)/len(dict_str):.1f}x")
print(f" 每个 Cookie 完整格式约增加: {(len(full_str) - len(dict_str)) // len(cookies_dict)} 字符")
print()
print("✅ 结论:")
print(" - 完整格式虽然较大,但包含所有必要属性")
print(" - 对于数据库存储,建议使用完整格式")
print(" - 对于前端展示,可以使用键值对格式")
print()
if __name__ == "__main__":
# 测试 API 响应格式
test_api_response_format()
# 对比两种格式
compare_formats()
print("="*60)
print("✅ 测试完成!")
print("="*60)

View File

@@ -1,170 +0,0 @@
"""
基础浏览器测试脚本
用于测试浏览器是否能正常加载小红书页面
"""
import asyncio
from playwright.async_api import async_playwright
import sys
async def test_basic_browser(proxy_index: int = 0):
"""基础浏览器测试"""
print(f"\n{'='*60}")
print(f"🔍 基础浏览器测试")
print(f"{'='*60}")
# 从代理配置获取代理信息
from damai_proxy_config import get_proxy_config
proxy_config = get_proxy_config(proxy_index)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f"✅ 使用代理: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
try:
async with async_playwright() as p:
# 配置代理
proxy_parts = proxy_url.replace('http://', '').replace('https://', '').split('@')
if len(proxy_parts) == 2:
auth_part = proxy_parts[0]
server_part = proxy_parts[1]
username, password = auth_part.split(':')
proxy_config_obj = {
"server": f"http://{server_part}",
"username": username,
"password": password
}
else:
proxy_config_obj = {"server": proxy_url}
print(f" 配置的代理对象: {proxy_config_obj}")
# 启动浏览器
browser = await p.chromium.launch(
headless=False, # 非无头模式,便于观察
proxy=proxy_config_obj
)
# 创建上下文
context = await browser.new_context(
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
)
# 创建页面
page = await context.new_page()
print(f"\n🌐 尝试访问百度...")
try:
await page.goto('https://www.baidu.com', wait_until='networkidle', timeout=15000)
await asyncio.sleep(2)
title = await page.title()
url = page.url
content_len = len(await page.content())
print(f" ✅ 百度访问成功")
print(f" 标题: {title}")
print(f" URL: {url}")
print(f" 内容长度: {content_len} 字符")
except Exception as e:
print(f" ❌ 百度访问失败: {str(e)}")
print(f"\n🌐 尝试访问小红书登录页...")
try:
await page.goto('https://creator.xiaohongshu.com/login', wait_until='networkidle', timeout=15000)
await asyncio.sleep(5) # 等待更长时间
title = await page.title()
url = page.url
content = await page.content()
content_len = len(content)
print(f" 访问结果:")
print(f" 标题: {title}")
print(f" URL: {url}")
print(f" 内容长度: {content_len} 字符")
# 检查是否有特定内容
if content_len == 0:
print(f" ⚠️ 页面内容为空,可能存在加载问题")
elif "验证" in content or "captcha" in content.lower() or "安全" in content:
print(f" ⚠️ 检测到验证或安全提示")
else:
print(f" ✅ 页面加载正常")
# 查找页面上的所有元素
print(f"\n🔍 分析页面元素...")
# 查找所有input元素
inputs = await page.query_selector_all('input')
print(f" 找到 {len(inputs)} 个input元素")
# 查找所有表单相关元素
form_elements = await page.query_selector_all('input, button, select, textarea')
print(f" 找到 {len(form_elements)} 个表单相关元素")
# 打印前几个元素的信息
for i, elem in enumerate(form_elements[:5]):
try:
tag = await elem.evaluate('el => el.tagName')
text = await elem.inner_text()
placeholder = await elem.get_attribute('placeholder')
class_name = await elem.get_attribute('class')
id_attr = await elem.get_attribute('id')
print(f" 元素 {i+1}:")
print(f" - 标签: {tag}")
print(f" - 文本: {text[:50]}...")
print(f" - placeholder: {placeholder}")
print(f" - class: {class_name[:50]}...")
print(f" - id: {id_attr}")
except Exception as e:
print(f" 元素 {i+1}: 获取信息失败 - {str(e)}")
except Exception as e:
print(f" ❌ 小红书访问失败: {str(e)}")
import traceback
traceback.print_exc()
print(f"\n⏸️ 浏览器保持打开状态,您可以手动检查页面")
print(f" 按 Enter 键关闭浏览器...")
# 等待用户输入
input()
await browser.close()
print(f"✅ 浏览器已关闭")
except Exception as e:
print(f"❌ 测试过程异常: {str(e)}")
import traceback
traceback.print_exc()
async def main():
"""主函数"""
print("="*60)
print("🔍 基础浏览器测试工具")
print("="*60)
proxy_choice = input("\n请选择代理 (0 或 1, 默认为0): ").strip()
if proxy_choice not in ['0', '1']:
proxy_choice = '0'
proxy_idx = int(proxy_choice)
await test_basic_browser(proxy_idx)
print(f"\n{'='*60}")
print("✅ 测试完成!")
print("="*60)
if __name__ == "__main__":
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
# 运行测试
asyncio.run(main())

View File

@@ -1,213 +0,0 @@
"""
测试修复后的浏览器池
验证预热超时问题是否已解决
"""
import asyncio
import sys
from xhs_login import XHSLoginService
async def test_browser_pool_with_proxy(proxy_index: int = 0):
"""测试修复后的浏览器池"""
print(f"\n{'='*60}")
print(f"🔧 测试修复后的浏览器池")
print(f"{'='*60}")
# 从代理配置获取代理信息
from damai_proxy_config import get_proxy_config
proxy_config = get_proxy_config(proxy_index)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f"✅ 使用代理: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
print(f" 代理URL: {proxy_url}")
# 创建登录服务(使用浏览器池)
login_service = XHSLoginService(use_pool=True) # 使用浏览器池
try:
print(f"\n🚀 初始化浏览器(使用代理 + 浏览器池)...")
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
await login_service.init_browser(proxy=proxy_url, user_agent=user_agent)
print("✅ 浏览器初始化成功")
# 检查浏览器池状态
browser_pool = login_service.browser_pool
if browser_pool:
stats = browser_pool.get_stats()
print(f"\n📊 浏览器池状态:")
print(f" 主浏览器存活: {stats['browser_alive']}")
print(f" 上下文存活: {stats['context_alive']}")
print(f" 页面存活: {stats['page_alive']}")
print(f" 是否预热: {stats['is_preheated']}")
print(f" 临时浏览器数: {stats['temp_browsers_count']}")
# 访问小红书登录页面
print(f"\n🌐 访问小红书创作者平台...")
await login_service.page.goto('https://creator.xiaohongshu.com/login', wait_until='domcontentloaded', timeout=30000)
await asyncio.sleep(2)
title = await login_service.page.title()
url = login_service.page.url
content_len = len(await login_service.page.content())
print(f"✅ 访问成功")
print(f" 标题: {title}")
print(f" URL: {url}")
print(f" 内容长度: {content_len} 字符")
# 检查关键元素
phone_input = await login_service.page.query_selector('input[placeholder="手机号"]')
if phone_input:
print(f"✅ 找到手机号输入框")
else:
print(f"❌ 未找到手机号输入框")
# 查找所有input元素
inputs = await login_service.page.query_selector_all('input')
print(f" 共找到 {len(inputs)} 个input元素")
if content_len == 0:
print(f"⚠️ 页面内容为空")
else:
print(f"✅ 页面内容正常加载")
return True
except Exception as e:
print(f"❌ 测试失败: {str(e)}")
import traceback
traceback.print_exc()
return False
finally:
await login_service.close_browser()
async def test_multiple_requests(proxy_index: int = 0):
"""测试多个请求复用浏览器池"""
print(f"\n{'='*60}")
print(f"🔄 测试浏览器池复用")
print(f"{'='*60}")
from damai_proxy_config import get_proxy_config
proxy_config = get_proxy_config(proxy_index)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f"✅ 使用代理: 代理{proxy_index + 1}")
success_count = 0
for i in range(3):
print(f"\n🧪 请求 {i+1}/3")
login_service = XHSLoginService(use_pool=True)
try:
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
await login_service.init_browser(proxy=proxy_url, user_agent=user_agent)
# 访问页面
await login_service.page.goto('https://creator.xiaohongshu.com/login', wait_until='domcontentloaded', timeout=30000)
await asyncio.sleep(1)
content_len = len(await login_service.page.content())
if content_len > 0:
print(f" ✅ 请求 {i+1} 成功,内容长度: {content_len}")
success_count += 1
else:
print(f" ❌ 请求 {i+1} 失败,内容为空")
except Exception as e:
print(f" ❌ 请求 {i+1} 异常: {str(e)}")
finally:
await login_service.close_browser()
# 等待一下避免请求过于频繁
if i < 2:
await asyncio.sleep(1)
print(f"\n📈 测试结果: {success_count}/3 请求成功")
return success_count == 3
def explain_fix():
"""解释修复内容"""
print("="*60)
print("🔧 修复内容说明")
print("="*60)
print("\n修复的两个问题:")
print("1. 增加超时时间: 从30秒增加到45秒")
print("2. 修改等待策略: 从'networkidle'改为'domcontentloaded'")
print(" - 'networkidle': 等待网络空闲(可能等待时间过长)")
print(" - 'domcontentloaded': DOM内容加载完成更快更稳定")
print("\n浏览器池优化效果:")
print("✅ 减少预热超时错误")
print("✅ 提高页面加载成功率")
print("✅ 保持浏览器常驻,提升性能")
async def main():
"""主函数"""
explain_fix()
print(f"\n{'='*60}")
print("🎯 选择测试模式")
print("="*60)
print("\n1. 单次浏览器池测试")
print("2. 多请求复用测试")
print("3. 全部测试")
try:
choice = input("\n请选择测试模式 (1-3, 默认为3): ").strip()
if choice not in ['1', '2', '3']:
choice = '3'
proxy_choice = input("请选择代理 (0 或 1, 默认为0): ").strip()
if proxy_choice not in ['0', '1']:
proxy_choice = '0'
proxy_idx = int(proxy_choice)
if choice in ['1', '3']:
print(f"\n{'-'*40}")
print("测试1: 单次浏览器池测试")
success1 = await test_browser_pool_with_proxy(proxy_idx)
if choice in ['2', '3']:
print(f"\n{'-'*40}")
print("测试2: 多请求复用测试")
success2 = await test_multiple_requests(proxy_idx)
if choice == '3':
overall_success = success1 and success2
elif choice == '1':
overall_success = success1
else:
overall_success = success2
print(f"\n{'='*60}")
if overall_success:
print("✅ 所有测试通过!浏览器池预热问题已修复")
else:
print("❌ 部分测试失败,请检查配置")
print("="*60)
except KeyboardInterrupt:
print("\n\n⚠️ 测试被用户中断")
except Exception as e:
print(f"\n❌ 测试过程中出现错误: {str(e)}")
import traceback
traceback.print_exc()
if __name__ == "__main__":
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
# 运行测试
asyncio.run(main())

View File

@@ -1,162 +0,0 @@
"""
测试 Cookie 文件路径支持
"""
import subprocess
import sys
import json
def test_cookie_file_param():
"""测试 --cookies 参数支持文件路径"""
print("="*60)
print("测试 Cookie 文件路径参数支持")
print("="*60)
print()
# 测试命令
cmd = [
sys.executable,
"xhs_publish.py",
"--cookies", "test_cookies.json", # 使用文件路径
"--title", "【测试】Cookie文件路径参数",
"--content", "测试使用 --cookies 参数传递文件路径,而不是 JSON 字符串",
"--images", '["https://picsum.photos/800/600","https://picsum.photos/800/600"]',
"--tags", '["测试","Cookie文件","自动化"]'
]
print("执行命令:")
print(" ".join(cmd))
print()
print("-"*60)
print()
# 执行命令
try:
result = subprocess.run(
cmd,
capture_output=True,
text=True,
encoding='utf-8'
)
# 输出结果
print("标准输出:")
print(result.stdout)
if result.stderr:
print("\n标准错误:")
print(result.stderr)
print()
print("-"*60)
# 解析结果
try:
# 尝试从输出中提取 JSON 结果
lines = result.stdout.strip().split('\n')
for i, line in enumerate(lines):
if line.strip().startswith('{'):
json_str = '\n'.join(lines[i:])
response = json.loads(json_str)
print("\n解析结果:")
print(json.dumps(response, ensure_ascii=False, indent=2))
if response.get('success'):
print("\n✅ 测试成功Cookie 文件路径参数工作正常")
if 'url' in response:
print(f"📎 笔记链接: {response['url']}")
else:
print(f"\n❌ 测试失败: {response.get('error')}")
break
except json.JSONDecodeError:
print("⚠️ 无法解析 JSON 输出")
return result.returncode == 0
except Exception as e:
print(f"❌ 执行失败: {str(e)}")
return False
def test_quick_publish():
"""测试 quick_publish.py 脚本"""
print("\n")
print("="*60)
print("测试 quick_publish.py 脚本")
print("="*60)
print()
cmd = [
sys.executable,
"quick_publish.py",
"【测试】快速发布脚本",
"测试 quick_publish.py 的简化调用方式",
"https://picsum.photos/800/600,https://picsum.photos/800/600",
"测试,快速发布,自动化",
"test_cookies.json"
]
print("执行命令:")
print(" ".join(cmd))
print()
print("-"*60)
print()
try:
result = subprocess.run(
cmd,
capture_output=True,
text=True,
encoding='utf-8'
)
print(result.stdout)
if result.stderr:
print("\n标准错误:")
print(result.stderr)
return result.returncode == 0
except Exception as e:
print(f"❌ 执行失败: {str(e)}")
return False
if __name__ == "__main__":
print()
print("🧪 Cookie 文件路径支持测试")
print()
# 检查 Cookie 文件是否存在
import os
if not os.path.exists('test_cookies.json'):
print("❌ 错误: test_cookies.json 文件不存在")
print("请先创建 Cookie 文件")
sys.exit(1)
print("✅ 找到 Cookie 文件: test_cookies.json")
print()
# 测试1: xhs_publish.py 使用文件路径
success1 = test_cookie_file_param()
# 测试2: quick_publish.py
success2 = test_quick_publish()
# 总结
print()
print("="*60)
print("测试总结")
print("="*60)
print(f"xhs_publish.py (Cookie文件): {'✅ 通过' if success1 else '❌ 失败'}")
print(f"quick_publish.py: {'✅ 通过' if success2 else '❌ 失败'}")
print()
if success1 and success2:
print("🎉 所有测试通过!")
else:
print("⚠️ 部分测试失败,请检查错误信息")

View File

@@ -1,313 +0,0 @@
"""
测试Cookie格式处理修复
验证scheduler.py中的_format_cookies方法能正确处理各种Cookie格式
"""
import json
from typing import List, Dict
def _format_cookies(cookies) -> List[Dict]:
"""
格式化Cookie只处理非标准格式的Cookie
对于Playwright原生格式的Cookie直接返回不做任何修改
这是scheduler.py中_format_cookies方法的副本用于独立测试
Args:
cookies: Cookie数据支持list[dict]或dict格式
Returns:
格式化后的Cookie列表
"""
# 如果是字典格式(键值对),转换为列表格式
if isinstance(cookies, dict):
cookies = [
{
"name": name,
"value": str(value) if not isinstance(value, str) else value,
"domain": ".xiaohongshu.com",
"path": "/"
}
for name, value in cookies.items()
]
# 验证是否为列表
if not isinstance(cookies, list):
raise ValueError(f"Cookie必须是列表或字典格式当前类型: {type(cookies).__name__}")
# 检查是否是Playwright原生格式包含name和value字段
if cookies and isinstance(cookies[0], dict) and 'name' in cookies[0] and 'value' in cookies[0]:
# 已经是Playwright格式直接返回不做任何修改
return cookies
# 其他格式,进行基础验证
formatted_cookies = []
for cookie in cookies:
if not isinstance(cookie, dict):
raise ValueError(f"Cookie元素必须是字典格式当前类型: {type(cookie).__name__}")
# 确保有基本字段
if 'domain' not in cookie and 'url' not in cookie:
cookie = cookie.copy()
cookie['domain'] = '.xiaohongshu.com'
if 'path' not in cookie and 'url' not in cookie:
if 'domain' in cookie or 'url' not in cookie:
cookie = cookie.copy() if cookie is cookies[cookies.index(cookie)] else cookie
cookie['path'] = '/'
formatted_cookies.append(cookie)
return formatted_cookies
def test_format_cookies():
"""测试_format_cookies方法"""
print("="*60)
print("测试 Cookie 格式处理")
print("="*60)
# 测试1: 字典格式(键值对)
print("\n测试 1: 字典格式(键值对)")
cookies_dict = {
"a1": "xxx",
"webId": "yyy",
"web_session": "zzz"
}
try:
result = _format_cookies(cookies_dict)
print(f"✅ 成功处理字典格式")
print(f" 输入: {type(cookies_dict).__name__} with {len(cookies_dict)} items")
print(f" 输出: {type(result).__name__} with {len(result)} items")
print(f" 第一个Cookie: {result[0]}")
assert isinstance(result, list)
assert len(result) == 3
assert all('name' in c and 'value' in c and 'domain' in c for c in result)
except Exception as e:
print(f"❌ 失败: {str(e)}")
# 测试2: 列表格式(完整格式已有domain和path)
print("\n测试 2: 列表格式(完整格式)")
cookies_list_full = [
{
"name": "a1",
"value": "xxx",
"domain": ".xiaohongshu.com",
"path": "/",
"expires": -1,
"httpOnly": False,
"secure": False,
"sameSite": "Lax"
}
]
try:
result = _format_cookies(cookies_list_full)
print(f"✅ 成功处理完整列表格式")
print(f" 输入: {type(cookies_list_full).__name__} with {len(cookies_list_full)} items")
print(f" 输出: {type(result).__name__} with {len(result)} items")
# 验证Playwright原生格式被完整保留
print(f" 保留的字段: {list(result[0].keys())}")
assert result == cookies_list_full, "Playwright原生格式应该被完整保留不做任何修改"
assert 'expires' in result[0], "expires字段应该被保留"
assert result[0]['expires'] == -1, "expires=-1应该被保留"
assert isinstance(result, list)
assert len(result) == 1
except Exception as e:
print(f"❌ 失败: {str(e)}")
# 测试3: 非Playwright格式缺少name字段需要补充domain和path
print("\n测试 3: 非Playwright格式缺少字段需要补充")
cookies_list_partial = [
{
"cookie_name": "a1", # 没有name字段不是Playwright格式
"cookie_value": "xxx"
}
]
try:
result = _format_cookies(cookies_list_partial)
print(f"✅ 成功处理非Playwright格式")
print(f" 输入: {type(cookies_list_partial).__name__} with {len(cookies_list_partial)} items")
print(f" 输出: {type(result).__name__} with {len(result)} items")
print(f" 自动添加的字段: domain={result[0].get('domain')}, path={result[0].get('path')}")
assert isinstance(result, list)
# 应该自动添加domain和path
assert result[0]['domain'] == '.xiaohongshu.com'
assert result[0]['path'] == '/'
except Exception as e:
print(f"❌ 失败: {str(e)}")
# 测试4: 双重JSON编码(模拟数据库存储场景)
print("\n测试 4: 双重JSON编码字符串")
cookies_dict = {"a1": "xxx", "webId": "yyy"}
# 第一次JSON编码
cookies_json_1 = json.dumps(cookies_dict)
# 第二次JSON编码
cookies_json_2 = json.dumps(cookies_json_1)
print(f" 原始字典: {cookies_dict}")
print(f" 第一次编码: {cookies_json_1}")
print(f" 第二次编码: {cookies_json_2}")
# 模拟从数据库读取并解析
try:
# 第一次解析
cookies_parsed_1 = json.loads(cookies_json_2)
print(f" 第一次解析后类型: {type(cookies_parsed_1).__name__}")
# 处理双重编码
if isinstance(cookies_parsed_1, str):
cookies_parsed_2 = json.loads(cookies_parsed_1)
print(f" 第二次解析后类型: {type(cookies_parsed_2).__name__}")
cookies = cookies_parsed_2
else:
cookies = cookies_parsed_1
# 格式化
result = _format_cookies(cookies)
print(f"✅ 成功处理双重JSON编码")
print(f" 最终输出: {type(result).__name__} with {len(result)} items")
assert isinstance(result, list)
except Exception as e:
print(f"❌ 失败: {str(e)}")
# 测试5: 错误格式 - 字符串(不是JSON)
print("\n测试 5: 错误格式 - 普通字符串")
try:
result = _format_cookies("invalid_string")
print(f"❌ 应该抛出异常但没有")
except ValueError as e:
print(f"✅ 正确抛出ValueError异常")
print(f" 错误信息: {str(e)}")
except Exception as e:
print(f"❌ 抛出了非预期的异常: {str(e)}")
# 测试6: 错误格式 - 列表中包含非字典元素
print("\n测试 6: 错误格式 - 列表中包含非字典元素")
try:
result = _format_cookies(["string_item", 123])
print(f"❌ 应该抛出异常但没有")
except ValueError as e:
print(f"✅ 正确抛出ValueError异常")
print(f" 错误信息: {str(e)}")
except Exception as e:
print(f"❌ 抛出了非预期的异常: {str(e)}")
# 测试7: Playwright原生格式中value为对象保持原样
print("\n测试 7: Playwright原生格式中value为对象应保持原样")
cookies_with_object_value = [
{
"name": "test_cookie",
"value": {"nested": "object"}, # value是对象
"domain": ".xiaohongshu.com",
"path": "/"
}
]
try:
result = _format_cookies(cookies_with_object_value)
print(f"✅ Playwright原生格式被完整保留")
print(f" 输入value类型: {type(cookies_with_object_value[0]['value']).__name__}")
print(f" 输出value类型: {type(result[0]['value']).__name__}")
print(f" 输出value内容: {result[0]['value']}")
# Playwright原生格式不做任何修改包括uvalue
assert result == cookies_with_object_value, "Playwright原生格式应完整保留"
except Exception as e:
print(f"❌ 失败: {str(e)}")
# 测试8: 字典格式中value为数字
print("\n测试 8: 字典格式中value为数字应自动转换为字符串")
cookies_dict_with_number = {
"a1": "xxx",
"user_id": 12345, # value是数字
"is_login": True # value是布尔值
}
try:
result = _format_cookies(cookies_dict_with_number)
print(f"✅ 成功处理数字/布尔value")
print(f" 输入: {cookies_dict_with_number}")
print(f" user_id value类型: {type(result[1]['value']).__name__}, 值: {result[1]['value']}")
print(f" is_login value类型: {type(result[2]['value']).__name__}, 值: {result[2]['value']}")
# 验证不再包含expires等字段
print(f" 字段: {list(result[0].keys())}")
assert all(isinstance(c['value'], str) for c in result), "所有value应该都是字符串类型"
assert 'expires' not in result[0], "不应该包含expires字段"
except Exception as e:
print(f"❌ 失败: {str(e)}")
# 测试9: Playwright原生格式中expires=-1应被保留
print("\n测试 9: Playwright原生格式中expires=-1应被保留")
cookies_with_invalid_expires = [
{
"name": "test_cookie",
"value": "test_value",
"domain": ".xiaohongshu.com",
"path": "/",
"expires": -1 # Playwright原生格式
}
]
try:
result = _format_cookies(cookies_with_invalid_expires)
print(f"✅ Playwright原生格式被完整保留")
print(f" 原始字段: {list(cookies_with_invalid_expires[0].keys())}")
print(f" 处理后字段: {list(result[0].keys())}")
assert result == cookies_with_invalid_expires, "Playwright原生格式应被完整保留"
assert 'expires' in result[0] and result[0]['expires'] == -1, "expires=-1应该被保留"
except Exception as e:
print(f"❌ 失败: {str(e)}")
# 测试10: Playwright原生格式中expires为浮点数应被保留
print("\n测试 10: Playwright原生格式中expires为浮点数应被保留")
cookies_with_float_expires = [
{
"name": "test_cookie",
"value": "test_value",
"domain": ".xiaohongshu.com",
"path": "/",
"expires": 1797066497.112584 # Playwright原生格式常常有浮点数
}
]
try:
result = _format_cookies(cookies_with_float_expires)
print(f"✅ Playwright原生格式被完整保留")
print(f" 原始expires: {cookies_with_float_expires[0]['expires']} (类型: {type(cookies_with_float_expires[0]['expires']).__name__})")
print(f" 处理后expires: {result[0]['expires']} (类型: {type(result[0]['expires']).__name__})")
assert result == cookies_with_float_expires, "Playwright原生格式应被完整保留"
assert isinstance(result[0]['expires'], float), "expires浮点数应该被保留"
except Exception as e:
print(f"❌ 失败: {str(e)}")
# 测试11: Playwright原生格式中sameSite大小写应被保留
print("\n测试 11: Playwright原生格式中sameSite应被完整保留")
cookies_with_samesite = [
{
"name": "test_cookie1",
"value": "test_value1",
"domain": ".xiaohongshu.com",
"path": "/",
"sameSite": "Lax" # Playwright原生格式
},
{
"name": "test_cookie2",
"value": "test_value2",
"domain": ".xiaohongshu.com",
"path": "/",
"sameSite": "Strict"
}
]
try:
result = _format_cookies(cookies_with_samesite)
print(f"✅ Playwright原生格式被完整保留")
print(f" cookie1 sameSite: {result[0]['sameSite']}")
print(f" cookie2 sameSite: {result[1]['sameSite']}")
assert result == cookies_with_samesite, "Playwright原生格式应被完整保留"
assert result[0]['sameSite'] == 'Lax'
assert result[1]['sameSite'] == 'Strict'
except Exception as e:
print(f"❌ 失败: {str(e)}")
print("\n" + "="*60)
print("测试完成")
print("="*60)
if __name__ == "__main__":
test_format_cookies()

View File

@@ -1,143 +0,0 @@
"""
测试两种 Cookie 格式支持
"""
import asyncio
import json
from xhs_publish import XHSPublishService
# 格式1: Playwright 完整格式(从文件读取)
playwright_cookies = [
{
"name": "a1",
"value": "19b11d16e24t3h3xmlvojbrw1cr55xwamiacluw3c50000231766",
"domain": ".xiaohongshu.com",
"path": "/",
"expires": 1797066496,
"httpOnly": False,
"secure": False,
"sameSite": "Lax"
},
{
"name": "web_session",
"value": "030037ae088f0acf2c81329d432e4a12fcb0ca",
"domain": ".xiaohongshu.com",
"path": "/",
"expires": 1797066497.112584,
"httpOnly": True,
"secure": True,
"sameSite": "Lax"
}
]
# 格式2: 键值对格式(从数据库读取)
keyvalue_cookies = {
"a1": "19b11d16e24t3h3xmlvojbrw1cr55xwamiacluw3c50000231766",
"abRequestId": "b273b4d0-3ef7-5b8f-bba4-2d19e63ad883",
"acw_tc": "0a4ae09717655304937202738e4b75c08d6eb78f2c8d30d7dc5a465429e1e6",
"gid": "yjDyyfyKiD6DyjDyyfyKd37EJ49qxqC61hlV0qSDFEySFS2822CE01888JqyWKK8Djdi8d2j",
"loadts": "1765530496548",
"sec_poison_id": "a589e333-c364-477c-9d14-53af8a1e7f1c",
"unread": "{%22ub%22:%22648455690000000014025d90%22%2C%22ue%22:%2264b34737000000002f0262f9%22%2C%22uc%22:22}",
"webBuild": "5.0.6",
"webId": "fdf2dccee4bec7534aff5581310c0e26",
"web_session": "030037ae088f0acf2c81329d432e4a12fcb0ca",
"websectiga": "984412fef754c018e472127b8effd174be8a5d51061c991aadd200c69a2801d6",
"xsecappid": "xhs-pc-web"
}
async def test_playwright_format():
"""测试 Playwright 格式"""
print("="*60)
print("测试 1: Playwright 格式(完整格式)")
print("="*60)
try:
publisher = XHSPublishService(playwright_cookies)
print("✅ 初始化成功")
print(f" 转换后的 Cookie 数量: {len(publisher.cookies)}")
return True
except Exception as e:
print(f"❌ 初始化失败: {e}")
return False
async def test_keyvalue_format():
"""测试键值对格式"""
print("\n" + "="*60)
print("测试 2: 键值对格式(数据库格式)")
print("="*60)
try:
publisher = XHSPublishService(keyvalue_cookies)
print("✅ 初始化成功")
print(f" 转换后的 Cookie 数量: {len(publisher.cookies)}")
# 显示转换后的一个示例
print("\n转换示例(第一个 Cookie:")
print(json.dumps(publisher.cookies[0], ensure_ascii=False, indent=2))
return True
except Exception as e:
print(f"❌ 初始化失败: {e}")
return False
async def test_from_file():
"""从文件读取测试"""
print("\n" + "="*60)
print("测试 3: 从 cookies.json 文件读取")
print("="*60)
try:
with open('cookies.json', 'r', encoding='utf-8') as f:
cookies = json.load(f)
publisher = XHSPublishService(cookies)
print("✅ 初始化成功")
print(f" Cookie 数量: {len(publisher.cookies)}")
return True
except FileNotFoundError:
print("⚠️ cookies.json 文件不存在,跳过此测试")
return None
except Exception as e:
print(f"❌ 初始化失败: {e}")
return False
async def main():
print("\n🧪 Cookie 格式兼容性测试\n")
# 测试1: Playwright格式
result1 = await test_playwright_format()
# 测试2: 键值对格式
result2 = await test_keyvalue_format()
# 测试3: 从文件读取
result3 = await test_from_file()
# 总结
print("\n" + "="*60)
print("测试总结")
print("="*60)
print(f"Playwright 格式: {'✅ 通过' if result1 else '❌ 失败'}")
print(f"键值对格式: {'✅ 通过' if result2 else '❌ 失败'}")
if result3 is not None:
print(f"文件读取: {'✅ 通过' if result3 else '❌ 失败'}")
else:
print(f"文件读取: ⚠️ 跳过")
if result1 and result2:
print("\n🎉 所有格式测试通过!")
print("\n💡 使用说明:")
print(" - 从 Python 脚本保存的 cookies.json → Playwright 格式")
print(" - 从数据库读取的 Cookie → 键值对格式")
print(" - 两种格式都可以正常使用!")
else:
print("\n⚠️ 部分测试失败")
if __name__ == "__main__":
asyncio.run(main())

View File

@@ -1,31 +0,0 @@
@echo off
chcp 65001 >nul
echo ========================================
echo 小红书Cookie注入测试工具
echo ========================================
echo.
echo 此工具使用Playwright真实注入Cookie
echo 支持验证Cookie有效性并跳转到指定页面
echo.
echo ========================================
echo.
cd /d %~dp0
REM 检查是否有cookies.json文件
if exist cookies.json (
echo 检测到 cookies.json 文件
echo.
python test_cookie_inject.py
) else (
echo 未找到 cookies.json 文件
echo 请先准备Cookie文件或在程序中手动输入
echo.
python test_cookie_inject.py
)
echo.
echo ========================================
echo 测试完成
echo ========================================
pause

View File

@@ -1,398 +0,0 @@
"""
Cookie注入测试脚本
使用Playwright注入Cookie并验证其有效性
支持跳转到创作者中心或小红书首页
"""
import asyncio
import sys
import json
from pathlib import Path
from playwright.async_api import async_playwright
from typing import Optional, List, Dict, Any
class CookieInjector:
"""Cookie注入器"""
def __init__(self, headless: bool = False):
"""
初始化Cookie注入器
Args:
headless: 是否使用无头模式False可以看到浏览器界面
"""
self.headless = headless
self.playwright = None
self.browser = None
self.context = None
self.page = None
async def init_browser(self):
"""初始化浏览器"""
try:
print("正在启动浏览器...")
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
try:
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
except Exception as e:
print(f"警告: 设置事件循环策略失败: {str(e)}")
self.playwright = await async_playwright().start()
# 启动浏览器
self.browser = await self.playwright.chromium.launch(
headless=self.headless,
args=['--disable-blink-features=AutomationControlled']
)
# 创建浏览器上下文
self.context = await self.browser.new_context(
viewport={'width': 1280, 'height': 720},
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
)
# 创建新页面
self.page = await self.context.new_page()
print("浏览器初始化成功")
except Exception as e:
print(f"浏览器初始化失败: {str(e)}")
raise
async def inject_cookies(self, cookies: List[Dict[str, Any]]) -> bool:
"""
注入Cookie
Args:
cookies: Cookie列表
Returns:
是否注入成功
"""
try:
if not self.context:
await self.init_browser()
print(f"正在注入 {len(cookies)} 个Cookie...")
# 注入Cookie到浏览器上下文
await self.context.add_cookies(cookies)
print("Cookie注入成功")
return True
except Exception as e:
print(f"Cookie注入失败: {str(e)}")
return False
async def verify_and_navigate(self, target_page: str = 'creator') -> Dict[str, Any]:
"""
验证Cookie并跳转到指定页面
Args:
target_page: 目标页面类型 ('creator''home')
Returns:
验证结果字典
"""
try:
if not self.page:
return {"success": False, "error": "浏览器未初始化"}
# 确定目标URL
urls = {
'creator': 'https://creator.xiaohongshu.com',
'home': 'https://www.xiaohongshu.com'
}
target_url = urls.get(target_page, urls['creator'])
page_name = '创作者中心' if target_page == 'creator' else '小红书首页'
print(f"\n正在访问{page_name}: {target_url}")
# 访问目标页面
await self.page.goto(target_url, wait_until='networkidle', timeout=30000)
await asyncio.sleep(2) # 等待页面完全加载
# 获取当前URL和标题
current_url = self.page.url
title = await self.page.title()
print(f"当前URL: {current_url}")
print(f"页面标题: {title}")
# 检查是否被重定向到登录页
is_logged_in = 'login' not in current_url.lower()
if is_logged_in:
print("Cookie验证成功已登录状态")
# 尝试获取用户信息
try:
# 等待用户相关元素出现(如头像、用户名等)
await self.page.wait_for_selector('[class*="avatar"], [class*="user"]', timeout=5000)
print("检测到用户信息元素,确认登录成功")
except Exception:
print("未检测到明显的用户信息元素,但未跳转到登录页")
return {
"success": True,
"message": f"Cookie有效已成功访问{page_name}",
"url": current_url,
"title": title,
"logged_in": True
}
else:
print("Cookie可能已失效页面跳转到登录页")
return {
"success": False,
"error": "Cookie已失效或无效页面跳转到登录页",
"url": current_url,
"title": title,
"logged_in": False
}
except Exception as e:
print(f"验证过程异常: {str(e)}")
import traceback
traceback.print_exc()
return {
"success": False,
"error": f"验证过程异常: {str(e)}"
}
async def keep_browser_open(self, duration: int = 60):
"""
保持浏览器打开一段时间,方便观察
Args:
duration: 保持打开的秒数0表示永久打开直到手动关闭
"""
try:
if duration == 0:
print("\n浏览器将保持打开,按 Ctrl+C 关闭...")
try:
while True:
await asyncio.sleep(1)
except KeyboardInterrupt:
print("\n用户中断,准备关闭浏览器...")
else:
print(f"\n浏览器将保持打开 {duration} 秒...")
await asyncio.sleep(duration)
print("时间到,准备关闭浏览器...")
except Exception as e:
print(f"保持浏览器异常: {str(e)}")
async def close_browser(self):
"""关闭浏览器"""
try:
print("\n正在关闭浏览器...")
if self.page:
await self.page.close()
if self.context:
await self.context.close()
if self.browser:
await self.browser.close()
if self.playwright:
await self.playwright.stop()
print("浏览器已关闭")
except Exception as e:
print(f"关闭浏览器异常: {str(e)}")
def load_cookies_from_file(file_path: str) -> Optional[List[Dict[str, Any]]]:
"""
从文件加载Cookie
Args:
file_path: Cookie文件路径
Returns:
Cookie列表失败返回None
"""
try:
cookie_file = Path(file_path)
if not cookie_file.exists():
print(f"Cookie文件不存在: {file_path}")
return None
with open(cookie_file, 'r', encoding='utf-8') as f:
cookies = json.load(f)
if not isinstance(cookies, list):
print("Cookie格式错误必须是数组")
return None
if len(cookies) == 0:
print("Cookie数组为空")
return None
# 验证每个Cookie必须有name和value
for cookie in cookies:
if not cookie.get('name') or not cookie.get('value'):
print(f"Cookie格式错误缺少name或value字段")
return None
print(f"成功加载 {len(cookies)} 个Cookie")
return cookies
except json.JSONDecodeError as e:
print(f"Cookie文件JSON解析失败: {str(e)}")
return None
except Exception as e:
print(f"加载Cookie文件失败: {str(e)}")
return None
async def test_cookie_inject(
cookies_source: str,
target_page: str = 'creator',
headless: bool = False,
keep_open: int = 0
):
"""
测试Cookie注入
Args:
cookies_source: Cookie来源文件路径或JSON字符串
target_page: 目标页面 ('creator''home')
headless: 是否使用无头模式
keep_open: 保持浏览器打开的秒数0表示永久打开
"""
print("="*60)
print("Cookie注入并验证测试")
print("="*60)
# 加载Cookie
cookies = None
# 尝试作为文件路径加载
if Path(cookies_source).exists():
print(f"\n从文件加载Cookie: {cookies_source}")
cookies = load_cookies_from_file(cookies_source)
else:
# 尝试作为JSON字符串解析
try:
print("\n尝试解析Cookie JSON字符串...")
cookies = json.loads(cookies_source)
if isinstance(cookies, list) and len(cookies) > 0:
print(f"成功解析 {len(cookies)} 个Cookie")
except Exception as e:
print(f"Cookie解析失败: {str(e)}")
if not cookies:
print("\n加载Cookie失败请检查输入")
return
# 创建注入器
injector = CookieInjector(headless=headless)
try:
# 初始化浏览器
await injector.init_browser()
# 注入Cookie
inject_success = await injector.inject_cookies(cookies)
if not inject_success:
print("\nCookie注入失败")
return
# 验证并跳转
result = await injector.verify_and_navigate(target_page)
print("\n" + "="*60)
print("验证结果")
print("="*60)
if result.get('success'):
print(f"状态: 成功")
print(f"消息: {result.get('message')}")
print(f"URL: {result.get('url')}")
print(f"标题: {result.get('title')}")
print(f"登录状态: {'已登录' if result.get('logged_in') else '未登录'}")
else:
print(f"状态: 失败")
print(f"错误: {result.get('error')}")
if result.get('url'):
print(f"当前URL: {result.get('url')}")
# 保持浏览器打开
if keep_open >= 0:
await injector.keep_browser_open(keep_open)
except KeyboardInterrupt:
print("\n\n用户中断测试")
except Exception as e:
print(f"\n测试过程异常: {str(e)}")
import traceback
traceback.print_exc()
finally:
await injector.close_browser()
print("\n" + "="*60)
print("测试完成")
print("="*60)
async def main():
"""主函数"""
print("="*60)
print("小红书Cookie注入测试工具")
print("="*60)
print("\n功能说明:")
print("1. 注入Cookie到浏览器")
print("2. 验证Cookie有效性")
print("3. 跳转到指定页面(创作者中心/小红书首页)")
print("\n" + "="*60)
# 输入Cookie来源
print("\n请输入Cookie来源")
print("1. 输入Cookie文件路径如: cookies.json")
print("2. 直接粘贴JSON格式的Cookie")
cookies_source = input("\nCookie来源: ").strip()
if not cookies_source:
print("Cookie来源不能为空")
return
# 选择目标页面
print("\n请选择目标页面:")
print("1. 创作者中心creator.xiaohongshu.com")
print("2. 小红书首页www.xiaohongshu.com")
page_choice = input("\n选择 (1 或 2, 默认为 1): ").strip()
target_page = 'home' if page_choice == '2' else 'creator'
# 选择浏览器模式
headless_choice = input("\n是否使用无头模式?(y/n, 默认为 n): ").strip().lower()
headless = headless_choice == 'y'
# 选择保持打开时间
keep_open_input = input("\n保持浏览器打开时间0表示直到手动关闭默认60: ").strip()
try:
keep_open = int(keep_open_input) if keep_open_input else 60
except ValueError:
keep_open = 60
# 执行测试
await test_cookie_inject(
cookies_source=cookies_source,
target_page=target_page,
headless=headless,
keep_open=keep_open
)
if __name__ == "__main__":
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
# 运行测试
asyncio.run(main())

View File

@@ -1,207 +0,0 @@
"""
大麦固定代理IP测试脚本
测试两个固定代理IP在无头浏览器中的可用性
"""
import asyncio
import sys
from playwright.async_api import async_playwright
# 大麦固定代理IP配置
DAMAI_PROXIES = [
{
"name": "大麦代理1",
"server": "http://36.137.177.131:50001",
"username": "qqwvy0",
"password": "mun3r7xz"
},
{
"name": "大麦代理2",
"server": "http://111.132.40.72:50002",
"username": "ih3z07",
"password": "078bt7o5"
}
]
async def test_proxy(proxy_config: dict):
"""
测试单个代理IP
Args:
proxy_config: 代理配置字典
"""
print(f"\n{'='*60}")
print(f"🔍 开始测试: {proxy_config['name']}")
print(f" 代理服务器: {proxy_config['server']}")
print(f" 认证信息: {proxy_config['username']} / {proxy_config['password']}")
print(f"{'='*60}")
playwright = None
browser = None
try:
# 启动Playwright
playwright = await async_playwright().start()
print("✅ Playwright启动成功")
# 配置代理
proxy_settings = {
"server": proxy_config["server"],
"username": proxy_config["username"],
"password": proxy_config["password"]
}
# 启动浏览器(带代理)
print(f"🚀 正在启动浏览器(使用代理: {proxy_config['server']}...")
browser = await playwright.chromium.launch(
headless=True,
proxy=proxy_settings,
args=[
'--disable-blink-features=AutomationControlled',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-web-security',
'--disable-features=IsolateOrigins,site-per-process',
]
)
print("✅ 浏览器启动成功")
# 创建上下文
context = await browser.new_context(
viewport={'width': 1280, 'height': 720},
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
)
print("✅ 浏览器上下文创建成功")
# 创建页面
page = await context.new_page()
print("✅ 页面创建成功")
# 测试1: 访问IP检测网站检查代理IP是否生效
print("\n📍 测试1: 访问IP检测网站...")
try:
await page.goto("http://httpbin.org/ip", timeout=30000)
await asyncio.sleep(2)
# 获取页面内容
content = await page.content()
print("✅ 访问成功,页面内容:")
print(content[:500]) # 只显示前500字符
# 尝试提取IP信息
ip_info = await page.evaluate("() => document.body.innerText")
print(f"\n🌐 当前IP信息:\n{ip_info}")
except Exception as e:
print(f"❌ 测试1失败: {str(e)}")
# 测试2: 访问小红书登录页(检查代理在实际场景中是否可用)
print("\n📍 测试2: 访问小红书登录页...")
try:
await page.goto("https://creator.xiaohongshu.com/login", timeout=30000)
await asyncio.sleep(3)
title = await page.title()
url = page.url
print(f"✅ 访问成功")
print(f" 页面标题: {title}")
print(f" 当前URL: {url}")
except Exception as e:
print(f"❌ 测试2失败: {str(e)}")
# 测试3: 访问大麦网(测试目标网站)
print("\n📍 测试3: 访问大麦网...")
try:
await page.goto("https://www.damai.cn/", timeout=30000)
await asyncio.sleep(3)
title = await page.title()
url = page.url
print(f"✅ 访问成功")
print(f" 页面标题: {title}")
print(f" 当前URL: {url}")
except Exception as e:
print(f"❌ 测试3失败: {str(e)}")
print(f"\n{proxy_config['name']} 测试完成")
except Exception as e:
print(f"\n{proxy_config['name']} 测试失败: {str(e)}")
import traceback
traceback.print_exc()
finally:
# 清理资源
try:
if browser:
await browser.close()
print("🧹 浏览器已关闭")
if playwright:
await playwright.stop()
print("🧹 Playwright已停止")
except Exception as e:
print(f"⚠️ 清理资源时出错: {str(e)}")
async def test_all_proxies():
"""测试所有代理IP"""
print("\n" + "="*60)
print("🎯 大麦固定代理IP测试")
print("="*60)
print(f"📊 共配置 {len(DAMAI_PROXIES)} 个代理IP")
# 依次测试每个代理
for i, proxy_config in enumerate(DAMAI_PROXIES, 1):
print(f"\n\n{'#'*60}")
print(f"# 测试进度: {i}/{len(DAMAI_PROXIES)}")
print(f"{'#'*60}")
await test_proxy(proxy_config)
# 测试间隔
if i < len(DAMAI_PROXIES):
print(f"\n⏳ 等待5秒后测试下一个代理...")
await asyncio.sleep(5)
print("\n" + "="*60)
print("🎉 所有代理测试完成!")
print("="*60)
async def test_single_proxy(index: int = 0):
"""
测试单个代理IP
Args:
index: 代理索引0或1
"""
if index < 0 or index >= len(DAMAI_PROXIES):
print(f"❌ 无效的代理索引: {index},请使用 0 或 1")
return
await test_proxy(DAMAI_PROXIES[index])
if __name__ == "__main__":
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
# 解析命令行参数
if len(sys.argv) > 1:
try:
proxy_index = int(sys.argv[1])
print(f"🎯 测试单个代理(索引: {proxy_index}")
asyncio.run(test_single_proxy(proxy_index))
except ValueError:
print("❌ 参数错误,请使用: python test_damai_proxy.py [0|1]")
print(" 0: 测试代理1")
print(" 1: 测试代理2")
print(" 不带参数: 测试所有代理")
else:
# 测试所有代理
asyncio.run(test_all_proxies())

View File

@@ -1,282 +0,0 @@
"""
对比测试有头模式和无头模式的页面获取情况
"""
import asyncio
from playwright.async_api import async_playwright
import sys
async def test_headless_comparison(proxy_index: int = 0):
"""对比测试有头模式和无头模式"""
print(f"\n{'='*60}")
print(f"🔍 对比测试有头模式 vs 无头模式")
print(f"{'='*60}")
# 从代理配置获取代理信息
from damai_proxy_config import get_proxy_config
proxy_config = get_proxy_config(proxy_index)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f"✅ 使用代理: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
# 配置代理对象
proxy_parts = proxy_url.replace('http://', '').replace('https://', '').split('@')
if len(proxy_parts) == 2:
auth_part = proxy_parts[0]
server_part = proxy_parts[1]
username, password = auth_part.split(':')
proxy_config_obj = {
"server": f"http://{server_part}",
"username": username,
"password": password
}
else:
proxy_config_obj = {"server": proxy_url}
print(f" 配置的代理对象: {proxy_config_obj}")
# 测试无头模式
print(f"\n🧪 测试 1/2: 无头模式 (headless=True)")
await test_single_mode(True, proxy_config_obj)
print(f"\n🧪 测试 2/2: 有头模式 (headless=False)")
await test_single_mode(False, proxy_config_obj)
print(f"\n{'='*60}")
print("✅ 对比测试完成!")
print("="*60)
async def test_single_mode(headless: bool, proxy_config_obj: dict):
"""测试单个模式"""
mode_name = "无头模式" if headless else "有头模式"
print(f" 正在启动浏览器 ({mode_name})...")
try:
async with async_playwright() as p:
# 启动浏览器
browser = await p.chromium.launch(
headless=headless,
proxy=proxy_config_obj,
# 添加一些额外参数以提高稳定性
args=[
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-blink-features=AutomationControlled',
]
)
# 创建上下文
context = await browser.new_context(
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
viewport={'width': 1280, 'height': 720}
)
# 创建页面
page = await context.new_page()
# 访问小红书登录页面
print(f" 访问小红书登录页...")
try:
# 使用不同的wait_until策略
await page.goto('https://creator.xiaohongshu.com/login',
wait_until='domcontentloaded',
timeout=15000)
# 等待一段时间让页面内容加载
await asyncio.sleep(3)
# 获取页面信息
title = await page.title()
url = page.url
content = await page.content()
content_len = len(content)
print(f"{mode_name} - 访问成功")
print(f" 标题: {title}")
print(f" URL: {url}")
print(f" 内容长度: {content_len} 字符")
# 检查关键元素
phone_input = await page.query_selector('input[placeholder="手机号"]')
if phone_input:
print(f" ✅ 找到手机号输入框")
else:
print(f" ❌ 未找到手机号输入框")
# 查找所有input元素
inputs = await page.query_selector_all('input')
print(f" 找到 {len(inputs)} 个input元素")
if content_len == 0:
print(f" ⚠️ 页面内容为空")
elif "验证" in content or "captcha" in content.lower() or "安全" in content:
print(f" ⚠️ 检测到验证或安全提示")
else:
print(f" ✅ 页面内容正常")
except Exception as e:
print(f"{mode_name} - 访问失败: {str(e)}")
await browser.close()
print(f" 🔄 {mode_name} 浏览器已关闭")
except Exception as e:
print(f"{mode_name} - 测试异常: {str(e)}")
async def test_with_different_wait_strategies(proxy_index: int = 0):
"""测试不同的页面等待策略"""
print(f"\n{'='*60}")
print(f"🔍 测试不同页面等待策略")
print(f"{'='*60}")
from damai_proxy_config import get_proxy_config
proxy_config = get_proxy_config(proxy_index)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
proxy_parts = proxy_url.replace('http://', '').replace('https://', '').split('@')
if len(proxy_parts) == 2:
auth_part = proxy_parts[0]
server_part = proxy_parts[1]
username, password = auth_part.split(':')
proxy_config_obj = {
"server": f"http://{server_part}",
"username": username,
"password": password
}
else:
proxy_config_obj = {"server": proxy_url}
wait_strategies = [
('domcontentloaded', 'DOM内容加载完成'),
('load', '页面完全加载'),
('networkidle', '网络空闲'),
('commit', '导航提交')
]
for wait_strategy, description in wait_strategies:
print(f"\n🧪 测试等待策略: {description} ({wait_strategy})")
try:
async with async_playwright() as p:
browser = await p.chromium.launch(
headless=True, # 使用无头模式进行测试
proxy=proxy_config_obj
)
context = await browser.new_context(
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
)
page = await context.new_page()
try:
print(f" 访问小红书登录页 (wait_until='{wait_strategy}')...")
await page.goto('https://creator.xiaohongshu.com/login',
wait_until=wait_strategy,
timeout=15000)
# 额外等待时间
await asyncio.sleep(2)
content = await page.content()
content_len = len(content)
print(f" ✅ 访问成功")
print(f" 内容长度: {content_len} 字符")
# 检查手机号输入框
phone_input = await page.query_selector('input[placeholder="手机号"]')
if phone_input:
print(f" ✅ 找到手机号输入框")
else:
print(f" ❌ 未找到手机号输入框")
except Exception as e:
print(f" ❌ 访问失败: {str(e)}")
await browser.close()
except Exception as e:
print(f" ❌ 测试异常: {str(e)}")
def explain_page_loading_factors():
"""解释影响页面加载的因素"""
print("="*60)
print("💡 影响页面加载的因素")
print("="*60)
print("\n1. 浏览器模式差异:")
print(" • 有头模式: 浏览器界面可见,渲染更完整")
print(" • 无头模式: 后台运行,可能加载策略略有不同")
print("\n2. 页面等待策略:")
print(" • domcontentloaded: DOM构建完成推荐")
print(" • load: 所有资源加载完成")
print(" • networkidle: 网络空闲(可能等待较长时间)")
print("\n3. 反检测措施:")
print(" • 浏览器指纹混淆")
print(" • User-Agent设置")
print(" • 禁用webdriver属性")
print("\n4. 网络因素:")
print(" • 代理IP质量")
print(" • 网络延迟")
print(" • 目标网站反爬虫机制")
async def main():
"""主函数"""
explain_page_loading_factors()
print(f"\n{'='*60}")
print("🎯 选择测试模式")
print("="*60)
print("\n1. 有头模式 vs 无头模式对比测试")
print("2. 不同页面等待策略测试")
try:
choice = input("\n请选择测试模式 (1-2, 默认为1): ").strip()
if choice not in ['1', '2']:
choice = '1'
proxy_choice = input("请选择代理 (0 或 1, 默认为0): ").strip()
if proxy_choice not in ['0', '1']:
proxy_choice = '0'
proxy_idx = int(proxy_choice)
if choice == '1':
await test_headless_comparison(proxy_idx)
elif choice == '2':
await test_with_different_wait_strategies(proxy_idx)
print(f"\n{'='*60}")
print("✅ 测试完成!")
print("="*60)
except KeyboardInterrupt:
print("\n\n⚠️ 测试被用户中断")
except Exception as e:
print(f"\n❌ 测试过程中出现错误: {str(e)}")
import traceback
traceback.print_exc()
if __name__ == "__main__":
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
# 运行测试
asyncio.run(main())

View File

@@ -1,356 +0,0 @@
"""
使用代理并开启有头模式的示例
展示如何在使用代理的同时开启浏览器界面
"""
import asyncio
from playwright.async_api import async_playwright
import sys
async def test_proxy_with_headless_false(proxy_index: int = 0):
"""使用代理并开启有头模式测试"""
print(f"\n{'='*60}")
print(f"🔍 测试代理 + 有头模式")
print(f"{'='*60}")
# 从代理配置获取代理信息
from damai_proxy_config import get_proxy_config
proxy_config = get_proxy_config(proxy_index)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f"✅ 使用代理: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
print(f" 有头模式: 开启")
try:
async with async_playwright() as p:
# 配置代理
proxy_parts = proxy_url.replace('http://', '').replace('https://', '').split('@')
if len(proxy_parts) == 2:
auth_part = proxy_parts[0]
server_part = proxy_parts[1]
username, password = auth_part.split(':')
proxy_config_obj = {
"server": f"http://{server_part}",
"username": username,
"password": password
}
else:
proxy_config_obj = {"server": proxy_url}
print(f" 配置的代理对象: {proxy_config_obj}")
# 启动浏览器 - 使用有头模式
browser = await p.chromium.launch(
headless=False, # 有头模式,可以看到浏览器界面
proxy=proxy_config_obj
)
# 创建上下文
context = await browser.new_context(
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
)
# 创建页面
page = await context.new_page()
print(f"\n🌐 访问百度测试代理连接...")
try:
await page.goto('https://www.baidu.com', wait_until='networkidle', timeout=15000)
await asyncio.sleep(2)
title = await page.title()
url = page.url
print(f" ✅ 百度访问成功")
print(f" 标题: {title}")
print(f" URL: {url}")
except Exception as e:
print(f" ❌ 百度访问失败: {str(e)}")
print(f"\n🌐 访问小红书创作者平台...")
try:
await page.goto('https://creator.xiaohongshu.com/login', wait_until='networkidle', timeout=15000)
await asyncio.sleep(3)
title = await page.title()
url = page.url
content_len = len(await page.content())
print(f" 访问结果:")
print(f" 标题: {title}")
print(f" URL: {url}")
print(f" 内容长度: {content_len} 字符")
if content_len == 0:
print(f" ⚠️ 页面内容为空")
else:
print(f" ✅ 页面加载成功")
except Exception as e:
print(f" ❌ 小红书访问失败: {str(e)}")
print(f"\n⏸️ 浏览器保持打开状态,您可以观察页面")
print(f" 代理正在生效,您可以看到浏览器界面")
print(f" 按 Enter 键关闭浏览器...")
# 等待用户输入
input()
await browser.close()
print(f"✅ 浏览器已关闭")
except Exception as e:
print(f"❌ 测试过程异常: {str(e)}")
import traceback
traceback.print_exc()
async def test_xhs_login_with_headless_false(phone: str, proxy_index: int = 0):
"""
使用有头模式测试小红书登录流程
Args:
phone: 手机号
proxy_index: 代理索引 (0 或 1)
"""
print(f"\n{'='*60}")
print(f"📱 使用有头模式测试小红书登录")
print(f"{'='*60}")
# 从代理配置获取代理信息
from damai_proxy_config import get_proxy_config
proxy_config = get_proxy_config(proxy_index)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f"✅ 使用代理: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
print(f" 手机号: {phone}")
print(f" 有头模式: 开启")
# 创建登录服务,使用有头模式
from xhs_login import XHSLoginService
login_service = XHSLoginService(use_pool=False) # 不使用池,便于调试
try:
# 初始化浏览器(使用代理 + 有头模式)
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
# 注意XHSLoginService 内部使用了浏览器池模式,我们先看看如何修改它来支持有头模式
print(" 正在启动浏览器(使用代理 + 有头模式)...")
# 直接使用Playwright创建有头模式的浏览器
async with async_playwright() as p:
# 配置代理
proxy_parts = proxy_url.replace('http://', '').replace('https://', '').split('@')
if len(proxy_parts) == 2:
auth_part = proxy_parts[0]
server_part = proxy_parts[1]
username, password = auth_part.split(':')
proxy_config_obj = {
"server": f"http://{server_part}",
"username": username,
"password": password
}
else:
proxy_config_obj = {"server": proxy_url}
# 启动浏览器 - 有头模式
browser = await p.chromium.launch(
headless=False, # 有头模式
proxy=proxy_config_obj
)
context = await browser.new_context(
user_agent=user_agent,
viewport={'width': 1280, 'height': 720}
)
page = await context.new_page()
print("✅ 浏览器启动成功(有头模式 + 代理)")
# 访问小红书登录页面
print(f"\n🌐 访问小红书创作者平台登录页...")
await page.goto('https://creator.xiaohongshu.com/login', wait_until='networkidle', timeout=30000)
await asyncio.sleep(2)
print(f"✅ 进入登录页面")
print(f" 当前URL: {page.url}")
# 查找手机号输入框
print(f"\n🔍 查找手机号输入框...")
try:
# 尝试多种选择器
phone_input_selectors = [
'input[placeholder="手机号"]',
'input[placeholder*="手机"]',
'input[type="tel"]',
'input[type="text"]'
]
phone_input = None
for selector in phone_input_selectors:
try:
phone_input = await page.wait_for_selector(selector, timeout=3000)
if phone_input:
print(f" ✅ 找到手机号输入框: {selector}")
break
except:
continue
if phone_input:
# 输入手机号
await phone_input.fill(phone)
print(f" ✅ 已输入手机号: {phone}")
# 等待界面更新
await asyncio.sleep(1)
# 查找发送验证码按钮
print(f"\n🔍 查找发送验证码按钮...")
code_button_selectors = [
'text="发送验证码"',
'text="获取验证码"',
'button:has-text("验证码")',
'button:has-text("发送")',
'div:has-text("验证码")'
]
code_button = None
for selector in code_button_selectors:
try:
code_button = await page.wait_for_selector(selector, timeout=3000)
if code_button:
print(f" ✅ 找到验证码按钮: {selector}")
break
except:
continue
if code_button:
print(f"\n 已找到手机号输入框和验证码按钮")
print(f" 您可以在浏览器中手动点击发送验证码")
print(f" 验证码将发送到: {phone}")
print(f"\n⏸️ 浏览器保持打开状态,您可以手动操作")
print(f" 按 Enter 键关闭浏览器...")
input()
else:
print(f" ❌ 未找到发送验证码按钮")
else:
print(f" ❌ 未找到手机号输入框")
print(f"\n📄 页面上可用的输入框:")
inputs = await page.query_selector_all('input')
for i, inp in enumerate(inputs):
try:
placeholder = await inp.get_attribute('placeholder')
input_type = await inp.get_attribute('type')
print(f" 输入框 {i+1}: type={input_type}, placeholder={placeholder}")
except:
continue
except Exception as e:
print(f" ❌ 操作失败: {str(e)}")
# 保持浏览器打开供用户观察
print(f"\n⏸️ 浏览器保持打开状态,您可以观察页面元素")
print(f" 按 Enter 键关闭浏览器...")
input()
await browser.close()
print(f"✅ 浏览器已关闭")
except Exception as e:
print(f"❌ 测试过程异常: {str(e)}")
import traceback
traceback.print_exc()
def show_headless_comparison():
"""显示有头模式和无头模式的对比"""
print("="*60)
print("💡 有头模式 vs 无头模式对比")
print("="*60)
print("\n有头模式 (headless=False):")
print(" ✅ 优点:")
print(" • 可以看到浏览器界面,便于调试")
print(" • 可以观察页面加载过程")
print(" • 可以手动与页面交互")
print(" • 有助于识别页面元素选择器")
print("")
print(" ❌ 缺点:")
print(" • 占用屏幕空间")
print(" • 可能影响用户其他操作")
print(" • 资源消耗稍大")
print("\n无头模式 (headless=True):")
print(" ✅ 优点:")
print(" • 不显示浏览器界面,后台运行")
print(" • 资源消耗较少")
print(" • 适合自动化任务")
print(" • 可以在服务器环境运行")
print("")
print(" ❌ 缺点:")
print(" • 无法直观看到页面")
print(" • 调试相对困难")
print("\n🎯 使用建议:")
print(" • 开发调试时使用有头模式")
print(" • 生产环境使用无头模式")
print(" • 代理配置在两种模式下都有效")
async def main():
"""主函数"""
show_headless_comparison()
print(f"\n{'='*60}")
print("🎯 选择测试模式")
print("="*60)
print("\n1. 基础代理 + 有头模式测试")
print("2. 小红书登录 + 有头模式测试")
try:
choice = input("\n请选择测试模式 (1-2, 默认为1): ").strip()
if choice not in ['1', '2']:
choice = '1'
proxy_choice = input("请选择代理 (0 或 1, 默认为0): ").strip()
if proxy_choice not in ['0', '1']:
proxy_choice = '0'
proxy_idx = int(proxy_choice)
if choice == '1':
await test_proxy_with_headless_false(proxy_idx)
elif choice == '2':
phone = input("请输入手机号: ").strip()
if not phone:
print("❌ 手机号不能为空")
return
await test_xhs_login_with_headless_false(phone, proxy_idx)
print(f"\n{'='*60}")
print("✅ 测试完成!")
print("="*60)
except KeyboardInterrupt:
print("\n\n⚠️ 测试被用户中断")
except Exception as e:
print(f"\n❌ 测试过程中出现错误: {str(e)}")
import traceback
traceback.print_exc()
if __name__ == "__main__":
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
# 运行测试
asyncio.run(main())

View File

@@ -1,261 +0,0 @@
"""
小红书验证码登录流程测试脚本
测试完整的验证码发送和登录流程
"""
import asyncio
import sys
from xhs_login import XHSLoginService
async def test_send_verification_code(phone: str, proxy_index: int = 0):
"""
测试发送验证码流程
Args:
phone: 手机号
proxy_index: 代理索引 (0 或 1)
"""
print(f"\n{'='*60}")
print(f"📱 测试发送验证码流程")
print(f"{'='*60}")
# 从代理配置获取代理信息
from damai_proxy_config import get_proxy_config
proxy_config = get_proxy_config(proxy_index)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f"✅ 使用代理: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
print(f" 手机号: {phone}")
# 创建登录服务
login_service = XHSLoginService()
try:
# 初始化浏览器(使用代理)
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
await login_service.init_browser(proxy=proxy_url, user_agent=user_agent)
print("✅ 浏览器初始化成功(已启用代理)")
# 发送验证码
print(f"\n📤 正在发送验证码到 {phone}...")
result = await login_service.send_verification_code(phone)
if result.get('success'):
print(f"✅ 验证码发送成功!")
print(f" 消息: {result.get('message')}")
return login_service # 返回服务实例供后续登录使用
else:
print(f"❌ 验证码发送失败: {result.get('error')}")
return None
except Exception as e:
print(f"❌ 发送验证码过程异常: {str(e)}")
import traceback
traceback.print_exc()
return None
async def test_login_with_code(login_service: XHSLoginService, phone: str, code: str):
"""
测试使用验证码登录
Args:
login_service: XHSLoginService实例
phone: 手机号
code: 验证码
"""
print(f"\n{'='*60}")
print(f"🔑 测试使用验证码登录")
print(f"{'='*60}")
print(f" 手机号: {phone}")
print(f" 验证码: {code}")
try:
# 执行登录
result = await login_service.login(phone, code)
if result.get('success'):
print("✅ 登录成功!")
# 显示获取到的Cookies信息
cookies = result.get('cookies', {})
print(f" 获取到 {len(cookies)} 个Cookie")
# 保存完整Cookies到文件
cookies_full = result.get('cookies_full', [])
if cookies_full:
import json
with open('cookies.json', 'w', encoding='utf-8') as f:
json.dump(cookies_full, f, ensure_ascii=False, indent=2)
print(" ✅ 已保存完整Cookies到 cookies.json")
# 显示用户信息
user_info = result.get('user_info', {})
if user_info:
print(f" 用户信息: {list(user_info.keys())}")
return result
else:
print(f"❌ 登录失败: {result.get('error')}")
return result
except Exception as e:
print(f"❌ 登录过程异常: {str(e)}")
import traceback
traceback.print_exc()
return {"success": False, "error": str(e)}
async def test_complete_login_flow(phone: str, code: str = None, proxy_index: int = 0):
"""
测试完整的登录流程
Args:
phone: 手机号
code: 验证码如果为None则只测试发送验证码
proxy_index: 代理索引
"""
print("="*60)
print("🔄 测试完整登录流程")
print("="*60)
# 步骤1: 发送验证码
print("\n📋 步骤1: 发送验证码")
login_service = await test_send_verification_code(phone, proxy_index)
if not login_service:
print("❌ 发送验证码失败,终止流程")
return
# 如果提供了验证码,则执行登录
if code:
print("\n📋 步骤2: 使用验证码登录")
result = await test_login_with_code(login_service, phone, code)
if result.get('success'):
print("\n🎉 完整登录流程成功!")
else:
print(f"\n❌ 完整登录流程失败: {result.get('error')}")
else:
print("\n⚠️ 提供了验证码参数才可完成登录步骤")
print(" 请在手机上查看验证码,然后调用登录方法")
# 清理资源
await login_service.close_browser()
async def test_multiple_proxies_login(phone: str, proxy_indices: list = [0, 1]):
"""
测试使用多个代理进行登录
Args:
phone: 手机号
proxy_indices: 代理索引列表
"""
print("="*60)
print("🔄 测试多代理登录")
print("="*60)
for i, proxy_idx in enumerate(proxy_indices):
print(f"\n🧪 测试代理 {proxy_idx + 1} (第 {i+1} 次尝试)")
# 由于验证码只能发送一次,这里只测试发送验证码
login_service = await test_send_verification_code(phone, proxy_idx)
if login_service:
print(f" ✅ 代理 {proxy_idx + 1} 发送验证码成功")
await login_service.close_browser()
else:
print(f" ❌ 代理 {proxy_idx + 1} 发送验证码失败")
# 在测试之间添加延迟
if i < len(proxy_indices) - 1:
print(" ⏳ 等待3秒后测试下一个代理...")
await asyncio.sleep(3)
def show_usage_examples():
"""显示使用示例"""
print("="*60)
print("💡 使用示例")
print("="*60)
print("\n1⃣ 仅发送验证码:")
print(" # 发送验证码到手机号使用代理1")
print(" await test_send_verification_code('13800138000', proxy_index=0)")
print("\n2⃣ 完整登录流程:")
print(" # 完整流程:发送验证码 + 登录")
print(" await test_complete_login_flow('13800138000', '123456', proxy_index=0)")
print("\n3⃣ 多代理测试:")
print(" # 测试多个代理")
print(" await test_multiple_proxies_login('13800138000', [0, 1])")
async def main():
"""主函数"""
show_usage_examples()
print(f"\n{'='*60}")
print("🎯 选择测试模式")
print("="*60)
print("\n1. 发送验证码测试")
print("2. 完整登录流程测试")
print("3. 多代理测试")
try:
choice = input("\n请选择测试模式 (1-3, 默认为1): ").strip()
if choice not in ['1', '2', '3']:
choice = '1'
phone = input("请输入手机号: ").strip()
if not phone:
print("❌ 手机号不能为空")
return
if choice == '1':
proxy_choice = input("请选择代理 (0 或 1, 默认为0): ").strip()
if proxy_choice not in ['0', '1']:
proxy_choice = '0'
proxy_idx = int(proxy_choice)
await test_send_verification_code(phone, proxy_idx)
elif choice == '2':
code = input("请输入验证码 (留空则只测试发送): ").strip()
proxy_choice = input("请选择代理 (0 或 1, 默认为0): ").strip()
if proxy_choice not in ['0', '1']:
proxy_choice = '0'
proxy_idx = int(proxy_choice)
await test_complete_login_flow(phone, code if code else None, proxy_idx)
elif choice == '3':
await test_multiple_proxies_login(phone)
print(f"\n{'='*60}")
print("✅ 测试完成!")
print("="*60)
except KeyboardInterrupt:
print("\n\n⚠️ 测试被用户中断")
except Exception as e:
print(f"\n❌ 测试过程中出现错误: {str(e)}")
import traceback
traceback.print_exc()
if __name__ == "__main__":
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
# 运行测试
asyncio.run(main())

View File

@@ -1,106 +0,0 @@
"""
测试登录页面配置功能
验证通过配置文件控制登录页面类型creator vs home
"""
import sys
from config import load_config
def test_config_reading():
"""测试配置读取"""
print("="*60)
print("测试配置文件读取")
print("="*60)
# 测试dev配置
print("\n1. 测试开发环境配置 (config.dev.yaml)")
config_dev = load_config('dev')
login_page = config_dev.get_str('login.page', 'creator')
login_headless = config_dev.get_bool('login.headless', False)
print(f" login.page = {login_page}")
print(f" login.headless = {login_headless}")
# 根据配置决定预热URL
if login_page == "home":
preheat_url = "https://www.xiaohongshu.com"
else:
preheat_url = "https://creator.xiaohongshu.com/login"
print(f" 预热URL = {preheat_url}")
# 测试prod配置
print("\n2. 测试生产环境配置 (config.prod.yaml)")
config_prod = load_config('prod')
login_page_prod = config_prod.get_str('login.page', 'creator')
login_headless_prod = config_prod.get_bool('login.headless', False)
print(f" login.page = {login_page_prod}")
print(f" login.headless = {login_headless_prod}")
if login_page_prod == "home":
preheat_url_prod = "https://www.xiaohongshu.com"
else:
preheat_url_prod = "https://creator.xiaohongshu.com/login"
print(f" 预热URL = {preheat_url_prod}")
print("\n" + "="*60)
print("✅ 配置读取测试完成")
print("="*60)
def test_api_parameter_override():
"""测试API参数覆盖配置"""
print("\n" + "="*60)
print("测试API参数覆盖配置")
print("="*60)
config = load_config('dev')
default_login_page = config.get_str('login.page', 'creator')
# 模拟不同的API参数情况
test_cases = [
(None, "应使用配置默认值"),
("creator", "API指定creator"),
("home", "API指定home"),
]
for api_param, description in test_cases:
login_page = api_param if api_param else default_login_page
print(f"\n场景: {description}")
print(f" 配置默认值 = {default_login_page}")
print(f" API参数 = {api_param}")
print(f" 最终使用 = {login_page}")
# 决定URL
if login_page == "home":
url = "https://www.xiaohongshu.com"
page_name = "小红书首页"
else:
url = "https://creator.xiaohongshu.com/login"
page_name = "创作者中心"
print(f" → 将访问: {page_name} ({url})")
print("\n" + "="*60)
print("✅ API参数覆盖测试完成")
print("="*60)
if __name__ == "__main__":
try:
test_config_reading()
test_api_parameter_override()
print("\n🎉 所有测试通过!")
print("\n使用说明:")
print("1. 在 config.dev.yaml 或 config.prod.yaml 中修改 login.page 配置")
print("2. 可选值: creator (创作者中心) 或 home (小红书首页)")
print("3. API请求中的 login_page 参数可以覆盖配置文件的默认值")
print("4. 如果API请求不传 login_page 参数,将使用配置文件中的默认值")
except Exception as e:
print(f"\n❌ 测试失败: {str(e)}")
import traceback
traceback.print_exc()
sys.exit(1)

View File

@@ -1,80 +0,0 @@
"""
测试网络图片下载功能
"""
import asyncio
import json
from xhs_publish import XHSPublishService
async def test_network_images():
"""测试网络图片功能"""
print("="*50)
print("网络图片下载功能测试")
print("="*50)
print()
# 1. 准备测试 Cookie从 cookies.json 读取)
try:
with open('cookies.json', 'r', encoding='utf-8') as f:
cookies = json.load(f)
print(f"✅ 成功读取 {len(cookies)} 个 Cookie")
except FileNotFoundError:
print("❌ cookies.json 文件不存在")
print("请先运行登录获取 Cookie:")
print(" python xhs_cli.py login <手机号> <验证码>")
return
# 2. 准备测试数据
title = "【测试】网络图片发布测试"
content = """测试使用网络 URL 图片发布笔记 📸
本次测试使用了:
✅ 网络 URL 图片picsum.photos
✅ 自动下载功能
✅ 临时文件管理
如果你看到这条笔记,说明网络图片功能正常!"""
# 3. 使用网络图片 URL
images = [
"https://picsum.photos/800/600?random=test1",
"https://picsum.photos/800/600?random=test2",
"https://picsum.photos/800/600?random=test3"
]
print(f"\n测试图片 URL:")
for i, url in enumerate(images, 1):
print(f" {i}. {url}")
tags = ["测试", "网络图片", "自动发布"]
# 4. 创建发布服务
print("\n开始测试发布...")
publisher = XHSPublishService(cookies)
# 5. 执行发布
result = await publisher.publish(
title=title,
content=content,
images=images,
tags=tags,
cleanup=True # 自动清理临时文件
)
# 6. 显示结果
print("\n" + "="*50)
print("测试结果:")
print(json.dumps(result, ensure_ascii=False, indent=2))
print("="*50)
if result.get('success'):
print("\n✅ 测试成功!网络图片功能正常")
if 'url' in result:
print(f"📎 笔记链接: {result['url']}")
else:
print(f"\n❌ 测试失败: {result.get('error')}")
if __name__ == "__main__":
asyncio.run(test_network_images())

View File

@@ -1,246 +0,0 @@
"""
优化的代理浏览器配置
解决小红书对代理IP的限制问题
"""
import asyncio
from playwright.async_api import async_playwright
import sys
async def test_optimized_proxy_browser(proxy_index: int = 0):
"""测试优化的代理浏览器配置"""
print(f"\n{'='*60}")
print(f"🚀 测试优化的代理浏览器配置")
print(f"{'='*60}")
# 从代理配置获取代理信息
from damai_proxy_config import get_proxy_config
proxy_config = get_proxy_config(proxy_index)
proxy_server = proxy_config['server'].replace('http://', '')
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_server}"
print(f"✅ 使用代理: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
try:
async with async_playwright() as p:
# 配置代理
proxy_parts = proxy_url.replace('http://', '').replace('https://', '').split('@')
if len(proxy_parts) == 2:
auth_part = proxy_parts[0]
server_part = proxy_parts[1]
username, password = auth_part.split(':')
proxy_config_obj = {
"server": f"http://{server_part}",
"username": username,
"password": password
}
else:
proxy_config_obj = {"server": proxy_url}
print(f" 配置的代理对象: {proxy_config_obj}")
# 启动浏览器 - 使用优化参数
browser = await p.chromium.launch(
headless=False, # 使用有头模式,便于观察
proxy=proxy_config_obj,
args=[
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-blink-features=AutomationControlled',
'--disable-background-timer-throttling',
'--disable-renderer-backgrounding',
'--disable-background-networking',
'--enable-features=NetworkService,NetworkServiceInProcess',
'--disable-ipc-flooding-protection',
'--disable-web-security',
'--disable-features=IsolateOrigins,site-per-process',
'--disable-site-isolation-trials',
'--disable-extensions',
'--disable-breakpad',
'--disable-component-extensions-with-background-pages',
'--disable-hang-monitor',
'--disable-prompt-on-repost',
'--disable-domain-reliability',
'--disable-component-update',
'--hide-scrollbars',
'--mute-audio',
'--no-first-run',
'--no-default-browser-check',
'--metrics-recording-only',
'--force-color-profile=srgb',
'--disable-default-apps',
'--disable-features=TranslateUI',
'--disable-features=Translate',
'--disable-features=OptimizationHints',
'--disable-features=InterestCohortAPI',
'--disable-features=BlinkGenPropertyTrees',
'--disable-features=ImprovedCookieControls',
'--disable-features=SameSiteDefaultChecksMethodRigorously',
'--disable-features=CookieSameSiteByDefaultWhenReportingEnabled',
'--disable-features=AutofillServerCommunication',
'--disable-features=AutofillUseOptimizedLocalStorage',
'--disable-features=CalculateNativeWinOcclusion',
'--disable-features=VizDisplayCompositor',
'--disable-features=VizHitTestQuery',
]
)
# 创建上下文 - 设置浏览器指纹混淆
context = await browser.new_context(
user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
viewport={'width': 1280, 'height': 720},
# 隐瞒自动化特征
bypass_csp=True,
java_script_enabled=True,
)
# 创建页面
page = await context.new_page()
# 隐瞒自动化特征
await page.add_init_script("""
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined,
});
Object.defineProperty(navigator, 'plugins', {
get: () => [1, 2, 3, 4, 5],
});
Object.defineProperty(navigator, 'languages', {
get: () => ['zh-CN', 'zh', 'en'],
});
// 隐瞒代理检测
delete window.cdc_adoQpoasnfa76pfcZLmcfl_Array;
delete window.cdc_adoQpoasnfa76pfcZLmcfl_Promise;
delete window.cdc_adoQpoasnfa76pfcZLmcfl_Symbol;
""")
print(f"\n🌐 访问百度测试代理连接...")
try:
await page.goto('https://www.baidu.com', wait_until='domcontentloaded', timeout=15000)
await asyncio.sleep(2)
title = await page.title()
url = page.url
print(f" ✅ 百度访问成功")
print(f" 标题: {title}")
print(f" URL: {url}")
except Exception as e:
print(f" ❌ 百度访问失败: {str(e)}")
print(f"\n🌐 访问小红书创作者平台...")
try:
await page.goto('https://creator.xiaohongshu.com/login', wait_until='domcontentloaded', timeout=30000)
await asyncio.sleep(3) # 等待更长时间
title = await page.title()
url = page.url
content = await page.content()
content_len = len(content)
print(f" 访问结果:")
print(f" 标题: {title}")
print(f" URL: {url}")
print(f" 内容长度: {content_len} 字符")
if content_len == 0:
print(f" ⚠️ 页面内容为空")
elif "验证" in content or "captcha" in content.lower() or "安全" in content:
print(f" ⚠️ 检测到验证或安全提示")
else:
print(f" ✅ 页面加载成功")
# 查找手机号输入框
print(f"\n🔍 查找手机号输入框...")
try:
phone_input = await page.wait_for_selector('input[placeholder="手机号"]', timeout=5000)
if phone_input:
print(f" ✅ 找到手机号输入框")
else:
print(f" ❌ 未找到手机号输入框")
except:
print(f" ❌ 未找到手机号输入框")
# 查找所有input元素
inputs = await page.query_selector_all('input')
print(f" 找到 {len(inputs)} 个input元素")
# 查找发送验证码按钮
print(f"\n🔍 查找发送验证码按钮...")
try:
code_button = await page.wait_for_selector('text="发送验证码"', timeout=5000)
if code_button:
print(f" ✅ 找到发送验证码按钮")
else:
print(f" ❌ 未找到发送验证码按钮")
except:
print(f" ❌ 未找到发送验证码按钮")
except Exception as e:
print(f" ❌ 小红书访问失败: {str(e)}")
print(f"\n⏸️ 浏览器保持打开状态,您可以观察页面")
print(f" 按 Enter 键关闭浏览器...")
input()
await browser.close()
print(f"✅ 浏览器已关闭")
except Exception as e:
print(f"❌ 测试过程异常: {str(e)}")
import traceback
traceback.print_exc()
def explain_optimizations():
"""解释优化措施"""
print("="*60)
print("🔧 优化措施说明")
print("="*60)
print("\n1. 浏览器启动参数优化:")
print(" • 添加更多反检测参数")
print(" • 禁用可能导致检测的功能")
print("\n2. 浏览器指纹混淆:")
print(" • 隐瞒webdriver特征")
print(" • 伪造插件列表")
print(" • 设置真实语言")
print("\n3. 页面加载策略:")
print(" • 使用domcontentloaded而非networkidle")
print(" • 增加超时时间")
async def main():
"""主函数"""
explain_optimizations()
print(f"\n{'='*60}")
print("🎯 选择代理进行测试")
print("="*60)
proxy_choice = input("\n请选择代理 (0 或 1, 默认为0): ").strip()
if proxy_choice not in ['0', '1']:
proxy_choice = '0'
proxy_idx = int(proxy_choice)
await test_optimized_proxy_browser(proxy_idx)
print(f"\n{'='*60}")
print("✅ 测试完成!")
print("="*60)
if __name__ == "__main__":
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
# 运行测试
asyncio.run(main())

View File

@@ -1,51 +0,0 @@
"""
测试OSS上传功能
"""
import sys
from oss_utils import OSSUploader
def test_oss_connection():
"""测试OSS连接"""
print("=" * 60)
print("测试阿里云OSS连接")
print("=" * 60)
try:
# 创建OSS上传器
uploader = OSSUploader()
print(f"\n✅ OSS配置:")
print(f" Bucket: {uploader.bucket_name}")
print(f" Endpoint: {uploader.endpoint}")
print(f" Access Key ID: {uploader.access_key_id[:8]}...")
# 测试Bucket是否可访问
try:
# 列出bucket中的对象最多1个
result = uploader.bucket.list_objects(prefix=uploader.base_path, max_keys=1)
print(f"\n✅ Bucket访问成功!")
print(f" 基础路径: {uploader.base_path}")
if result.object_list:
print(f" 示例文件: {result.object_list[0].key}")
except Exception as e:
print(f"\n❌ Bucket访问失败: {e}")
return False
print("\n" + "=" * 60)
print("✅ OSS配置测试通过!")
print("=" * 60)
return True
except Exception as e:
print(f"\n❌ OSS初始化失败: {e}")
print("\n请检查配置:")
print(" 1. Access Key ID和Secret是否正确")
print(" 2. Bucket名称是否正确")
print(" 3. Endpoint地区是否匹配")
return False
if __name__ == "__main__":
success = test_oss_connection()
sys.exit(0 if success else 1)

View File

@@ -1,16 +0,0 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import hashlib
passwords = [
"123456",
"password",
"admin123",
]
print("=== Python SHA256 密码加密测试 ===")
for pwd in passwords:
hash_result = hashlib.sha256(pwd.encode('utf-8')).hexdigest()
print(f"密码: {pwd}")
print(f"SHA256: {hash_result}\n")

View File

@@ -1,152 +0,0 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
固定代理IP测试脚本
使用requests请求代理服务器验证代理是否可用
"""
import requests
import json
from damai_proxy_config import get_proxy_config, get_all_enabled_proxies
def test_proxy_requests(proxy_info, target_url="http://httpbin.org/ip"):
"""
使用requests测试代理IP
Args:
proxy_info: 代理信息字典包含server, username, password
target_url: 目标测试URL
"""
print(f"\n{'='*60}")
print(f"🔍 测试代理: {proxy_info.get('name', 'Unknown')}")
print(f" 服务器: {proxy_info['server']}")
print(f" 用户名: {proxy_info['username']}")
print(f" 目标URL: {target_url}")
print(f"{'='*60}")
# 构建代理认证信息
proxy_server = proxy_info['server'].replace('http://', '')
proxy_url = f"http://{proxy_info['username']}:{proxy_info['password']}@{proxy_server}"
proxies = {
"http": proxy_url,
"https": proxy_url
}
try:
# 发送测试请求
print("🚀 发送测试请求...")
response = requests.get(target_url, proxies=proxies, timeout=5) # 减少超时时间到5秒
if response.status_code == 200:
print(f"✅ 代理测试成功!状态码: {response.status_code}")
# 尝试解析IP信息
try:
ip_info = response.json()
print(f"🌐 当前IP信息: {json.dumps(ip_info, indent=2, ensure_ascii=False)}")
except:
print(f"🌐 页面内容 (前500字符): {response.text[:500]}")
return True
else:
print(f"❌ 代理测试失败!状态码: {response.status_code}")
print(f"响应内容: {response.text[:200]}")
return False
except requests.exceptions.ProxyError:
print("❌ 代理连接错误:无法连接到代理服务器")
return False
except requests.exceptions.ConnectTimeout:
print("❌ 连接超时:代理服务器响应超时")
return False
except requests.exceptions.RequestException as e:
print(f"❌ 请求异常: {str(e)}")
return False
def test_all_proxies():
"""测试所有配置的代理"""
print("🎯 开始测试所有代理IP")
proxies = get_all_enabled_proxies()
if not proxies:
print("❌ 没有找到可用的代理配置")
return
print(f"📊 共找到 {len(proxies)} 个代理IP")
results = []
for i, proxy in enumerate(proxies, 1):
print(f"\n\n{'#'*60}")
print(f"# 测试进度: {i}/{len(proxies)}")
print(f"{'#'*60}")
success = test_proxy_requests(proxy)
results.append({
'proxy': proxy['name'],
'server': proxy['server'],
'success': success
})
if i < len(proxies):
print(f"\n⏳ 等待2秒后测试下一个代理...")
import time
time.sleep(2)
# 输出测试结果汇总
print(f"\n{'='*60}")
print("📊 测试结果汇总:")
print(f"{'='*60}")
success_count = 0
for result in results:
status = "✅ 成功" if result['success'] else "❌ 失败"
print(f" {result['proxy']} ({result['server']}) - {status}")
if result['success']:
success_count += 1
print(f"\n📈 总体成功率: {success_count}/{len(results)} ({success_count/len(results)*100:.1f}%)")
# 如果有成功的代理,显示可用于小红书的代理
successful_proxies = [r for r in results if r['success']]
if successful_proxies:
print(f"\n🎉 以下代理可用于小红书登录发文:")
for proxy in successful_proxies:
print(f" - {proxy['proxy']}: {proxy['server']}")
return results
def test_xhs_proxy_format():
"""测试适用于小红书的代理格式"""
print(f"\n{'='*60}")
print("🔧 测试适用于Playwright的代理格式")
print(f"{'='*60}")
proxies = get_all_enabled_proxies()
for proxy in proxies:
server = proxy['server'].replace('http://', '') # 移除http://前缀
proxy_url = f"http://{proxy['username']}:{proxy['password']}@{server}"
print(f" {proxy['name']}:")
print(f" 服务器地址: {proxy['server']}")
print(f" Playwright格式: {proxy_url}")
print()
if __name__ == "__main__":
print("🚀 开始测试固定代理IP")
# 测试代理格式
test_xhs_proxy_format()
# 测试所有代理
test_all_proxies()
print(f"\n{'='*60}")
print("🎉 代理测试完成!")
print(f"{'='*60}")

View File

@@ -1,126 +0,0 @@
"""
固定代理IP详细测试脚本
测试代理IP在Playwright中的表现包含更多调试信息
"""
import asyncio
import json
import sys
from xhs_login import XHSLoginService
from damai_proxy_config import get_proxy_config
async def test_proxy_detailed(proxy_index: int = 0):
"""详细测试代理IP"""
print(f"\n{'='*60}")
print(f"🔍 详细测试代理: 代理{proxy_index + 1}")
print(f"{'='*60}")
# 获取代理配置
try:
proxy_config = get_proxy_config(proxy_index)
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_config['server'][7:]}" # 移除http://前缀再重新组装
print(f"✅ 获取代理配置成功: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
except Exception as e:
print(f"❌ 获取代理配置失败: {str(e)}")
return None
# 创建登录服务实例
login_service = XHSLoginService(use_pool=False) # 不使用池,便于调试
try:
# 初始化浏览器(使用代理)
print(f"\n🚀 正在启动浏览器(使用代理)...")
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
await login_service.init_browser(proxy=proxy_url, user_agent=user_agent)
print("✅ 浏览器启动成功")
# 测试访问普通网站
print(f"\n📍 测试访问普通网站(百度)...")
try:
await login_service.page.goto('https://www.baidu.com', wait_until='networkidle', timeout=10000)
await asyncio.sleep(2)
title = await login_service.page.title()
url = login_service.page.url
print(f"✅ 百度访问成功")
print(f" 页面标题: {title}")
print(f" 当前URL: {url}")
except Exception as e:
print(f"❌ 百度访问失败: {str(e)}")
# 测试访问IP检测网站
print(f"\n📍 测试访问IP检测网站...")
try:
await login_service.page.goto('http://httpbin.org/ip', wait_until='networkidle', timeout=10000)
await asyncio.sleep(2)
content = await login_service.page.content()
print(f"✅ IP检测网站访问成功")
print(f" 页面内容: {content[:200]}...")
except Exception as e:
print(f"❌ IP检测网站访问失败: {str(e)}")
# 测试访问小红书创作者平台
print(f"\n📍 测试访问小红书创作者平台...")
try:
await login_service.page.goto('https://creator.xiaohongshu.com/login', wait_until='networkidle', timeout=20000) # 增加超时时间
await asyncio.sleep(3) # 等待更长时间
title = await login_service.page.title()
url = login_service.page.url
print(f"✅ 小红书访问成功")
print(f" 页面标题: '{title}'")
print(f" 当前URL: {url}")
# 检查页面内容
content = await login_service.page.content()
if "验证" in content or "captcha" in content.lower() or "block" in content.lower() or "安全验证" in content:
print("⚠️ 检测到可能的验证或拦截")
else:
print("✅ 未检测到验证拦截")
except Exception as e:
print(f"❌ 小红书访问失败: {str(e)}")
# 尝试访问普通页面看看是否完全被封
try:
await login_service.page.goto('https://www.google.com', wait_until='networkidle', timeout=10000)
print(" 提示: 代理可以访问其他网站,但可能被小红书限制")
except Exception:
print(" 提示: 代理可能完全被限制")
print(f"\n✅ 代理{proxy_index + 1} 详细测试完成")
return login_service
except Exception as e:
print(f"❌ 代理{proxy_index + 1} 详细测试失败: {str(e)}")
import traceback
traceback.print_exc()
return None
finally:
# 关闭浏览器
await login_service.close_browser()
async def main():
"""主测试函数"""
print("\n" + "="*60)
print("🎯 固定代理IP详细测试")
print("="*60)
# 测试两个代理
for i in range(2):
await test_proxy_detailed(i)
print(f"\n⏳ 等待3秒后测试下一个代理...")
await asyncio.sleep(3)
print(f"\n{'='*60}")
print("🎉 详细测试完成!")
print("="*60)
if __name__ == "__main__":
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
# 运行测试
asyncio.run(main())

View File

@@ -1,219 +0,0 @@
"""
固定代理IP下小红书登录发文功能测试脚本
测试使用固定代理IP进行小红书登录和发文功能
"""
import asyncio
import json
import sys
from xhs_login import XHSLoginService
from xhs_publish import XHSPublishService
from damai_proxy_config import get_proxy_config
async def test_login_with_proxy(proxy_index: int = 0):
"""使用指定代理测试小红书登录"""
print(f"\n{'='*60}")
print(f"🔍 开始测试代理登录: 代理{proxy_index + 1}")
print(f"{'='*60}")
# 获取代理配置
try:
proxy_config = get_proxy_config(proxy_index)
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_config['server'][7:]}" # 移除http://前缀再重新组装
print(f"✅ 获取代理配置成功: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
except Exception as e:
print(f"❌ 获取代理配置失败: {str(e)}")
return None
# 创建登录服务实例
login_service = XHSLoginService()
try:
# 初始化浏览器(使用代理)
print(f"\n🚀 正在启动浏览器(使用代理)...")
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
await login_service.init_browser(proxy=proxy_url, user_agent=user_agent)
print("✅ 浏览器启动成功")
# 访问小红书创作者平台
print(f"\n📍 访问小红书创作者平台...")
await login_service.page.goto('https://creator.xiaohongshu.com/login', wait_until='networkidle', timeout=30000)
await asyncio.sleep(2)
title = await login_service.page.title()
url = login_service.page.url
print(f"✅ 访问成功")
print(f" 页面标题: {title}")
print(f" 当前URL: {url}")
# 检查是否被代理拦截或出现验证码
content = await login_service.page.content()
if "验证" in content or "captcha" in content.lower() or "block" in content.lower():
print("⚠️ 检测到可能的验证或拦截")
print(f"\n✅ 代理{proxy_index + 1} 连接测试完成")
return login_service
except Exception as e:
print(f"❌ 代理{proxy_index + 1} 测试失败: {str(e)}")
import traceback
traceback.print_exc()
return None
finally:
# 注意:这里不关闭浏览器,让调用者决定何时关闭
pass
async def test_publish_with_proxy(cookies, proxy_index: int = 0):
"""使用指定代理测试小红书发文"""
print(f"\n{'='*60}")
print(f"📝 开始测试代理发文: 代理{proxy_index + 1}")
print(f"{'='*60}")
# 获取代理配置
try:
proxy_config = get_proxy_config(proxy_index)
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_config['server'][7:]}" # 移除http://前缀再重新组装
print(f"✅ 获取代理配置成功: 代理{proxy_index + 1}")
print(f" 代理服务器: {proxy_config['server']}")
except Exception as e:
print(f"❌ 获取代理配置失败: {str(e)}")
return None
# 准备测试数据
title = "【代理测试】固定IP代理发布测试"
content = """这是一条通过固定IP代理发布的测试笔记 📝
测试内容:
- 验证代理IP是否正常工作
- 检查发布功能是否正常
- 确认网络连接稳定性
如果你看到这条笔记,说明代理发布成功了!
#代理测试 #自动化发布 #网络测试"""
# 测试图片(可选)
images = [] # 可以添加图片路径进行测试
# 标签
tags = ["代理测试", "自动化发布", "网络测试"]
try:
# 创建发布服务
print(f"\n🚀 创建发布服务(使用代理: 代理{proxy_index + 1}...")
publisher = XHSPublishService(cookies, proxy=proxy_url)
# 执行发布
print(f"\n📤 开始发布笔记...")
result = await publisher.publish(
title=title,
content=content,
images=images if images else None,
tags=tags
)
# 显示结果
print(f"\n{'='*50}")
print("发布结果:")
print(json.dumps(result, ensure_ascii=False, indent=2))
print("="*50)
if result.get('success'):
print(f"\n✅ 代理{proxy_index + 1} 发布测试成功!")
if 'url' in result:
print(f"📎 笔记链接: {result['url']}")
else:
print(f"\n❌ 代理{proxy_index + 1} 发布测试失败: {result.get('error')}")
return result
except Exception as e:
print(f"❌ 代理{proxy_index + 1} 发布测试异常: {str(e)}")
import traceback
traceback.print_exc()
return None
async def main():
"""主测试函数"""
print("\n" + "="*60)
print("🎯 固定代理IP下小红书登录发文功能测试")
print("="*60)
# 测试代理连接
login_service = None
for i in range(2): # 测试两个代理
login_service = await test_login_with_proxy(i)
if login_service:
print(f"✅ 代理{i+1} 连接测试成功,可以用于后续操作")
break
else:
print(f"⚠️ 代理{i+1} 连接测试失败,尝试下一个代理...")
if not login_service:
print("\n❌ 所有代理都无法连接,测试终止")
return
try:
# 验证登录状态虽然我们没有真正的登录但可以检查Cookie是否有效
print(f"\n🔍 验证当前浏览器状态...")
verify_result = await login_service.verify_login_status()
print(f"验证结果: {verify_result.get('message', '未知状态')}")
except Exception as e:
print(f"验证状态时出错: {str(e)}")
# 如果有cookies.json文件可以尝试使用已保存的cookies进行发布测试
cookies = None
try:
with open('cookies.json', 'r', encoding='utf-8') as f:
cookies = json.load(f)
print(f"\n✅ 成功读取 cookies.json包含 {len(cookies)} 个Cookie")
except FileNotFoundError:
print(f"\n⚠️ cookies.json 文件不存在,跳过发布测试")
print(" 如需测试发布功能请先登录获取Cookie")
if cookies:
# 使用第一个有效的代理进行发布测试
for i in range(2):
proxy_config = get_proxy_config(i)
proxy_url = f"http://{proxy_config['username']}:{proxy_config['password']}@{proxy_config['server'][7:]}"
# 测试代理连接
temp_login = XHSLoginService()
try:
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
await temp_login.init_browser(cookies=cookies, proxy=proxy_url, user_agent=user_agent)
# 验证登录状态
verify_result = await temp_login.verify_login_status()
if verify_result.get('logged_in'):
print(f"\n✅ 代理{i+1} + Cookie 组合验证成功,开始发布测试")
await test_publish_with_proxy(cookies, i)
break
else:
print(f"⚠️ 代理{i+1} + Cookie 组合验证失败")
except Exception as e:
print(f"⚠️ 代理{i+1} 连接测试失败: {str(e)}")
finally:
await temp_login.close_browser()
else:
print("\n❌ 所有代理都无法与Cookie配合使用发布测试终止")
# 清理资源
if login_service:
await login_service.close_browser()
print(f"\n{'='*60}")
print("🎉 测试完成!")
print("="*60)
if __name__ == "__main__":
# Windows环境下设置事件循环策略
if sys.platform == 'win32':
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
# 运行测试
asyncio.run(main())

View File

@@ -1,90 +0,0 @@
"""
小红书发布功能测试脚本
快速测试发布功能是否正常工作
"""
import asyncio
import json
import os
from xhs_publish import XHSPublishService
async def test_publish():
"""测试发布功能"""
# 1. 从 cookies.json 读取 Cookie
try:
with open('cookies.json', 'r', encoding='utf-8') as f:
cookies = json.load(f)
print(f"✅ 成功读取 {len(cookies)} 个 Cookie")
except FileNotFoundError:
print("❌ cookies.json 文件不存在")
print("请先运行登录获取 Cookie:")
print(" python xhs_cli.py login <手机号> <验证码>")
return
except Exception as e:
print(f"❌ 读取 cookies.json 失败: {e}")
return
# 2. 准备测试数据
title = "【测试】小红书发布功能测试"
content = """这是一条测试笔记 📝
今天测试一下自动发布功能是否正常~
如果你看到这条笔记,说明发布成功了!
#测试 #自动化"""
# 3. 准备测试图片(可选)
images = []
test_image_dir = "temp_uploads"
if os.path.exists(test_image_dir):
for file in os.listdir(test_image_dir):
if file.lower().endswith(('.jpg', '.jpeg', '.png', '.gif')):
img_path = os.path.abspath(os.path.join(test_image_dir, file))
images.append(img_path)
if len(images) >= 3: # 最多3张测试图片
break
if images:
print(f"✅ 找到 {len(images)} 张测试图片")
else:
print("⚠️ 未找到测试图片,将只发布文字")
# 4. 准备标签
tags = ["测试", "自动化发布"]
# 5. 创建发布服务
print("\n开始发布测试笔记...")
publisher = XHSPublishService(cookies)
# 6. 执行发布
result = await publisher.publish(
title=title,
content=content,
images=images if images else None,
tags=tags
)
# 7. 显示结果
print("\n" + "="*50)
print("发布结果:")
print(json.dumps(result, ensure_ascii=False, indent=2))
print("="*50)
if result.get('success'):
print("\n✅ 测试成功!笔记已发布")
if 'url' in result:
print(f"📎 笔记链接: {result['url']}")
else:
print(f"\n❌ 测试失败: {result.get('error')}")
if __name__ == "__main__":
print("="*50)
print("小红书发布功能测试")
print("="*50)
print()
# 运行测试
asyncio.run(test_publish())

View File

@@ -17,6 +17,16 @@ from datetime import datetime
from pathlib import Path
from browser_pool import get_browser_pool
from error_screenshot import save_error_screenshot, save_screenshot_with_html
from loguru import logger
from damai_proxy_config import get_random_proxy, format_proxy_for_playwright
# 配置loguru日志格式
logger.remove() # 移除默认handler
logger.add(
sys.stderr,
format="<green>{time:YYYY-MM-DD HH:mm:ss}</green> | <level>{level: <8}</level> | <cyan>{message}</cyan>",
level="INFO"
)
async def download_image(url: str) -> str:
@@ -65,18 +75,20 @@ async def download_image(url: str) -> str:
class XHSLoginService:
"""小红书登录服务"""
def __init__(self, use_pool: bool = True, headless: bool = True, session_id: Optional[str] = None):
def __init__(self, use_pool: bool = True, headless: bool = True, session_id: Optional[str] = None, use_page_isolation: bool = False):
"""
初始化登录服务
Args:
use_pool: 是否使用浏览器池默认True提升性能
headless: 是否使用无头模式False为有头模式方便调试
session_id: 会话ID用于并发隔离不同的session_id会创建独立的浏览器实例
session_id: 会话 ID用于并发隔离不同的session_id会创建独立的浏览器实例
use_page_isolation: 是否使用页面隔离模式(扫码登录专用,减少浏览器实例数)
"""
self.use_pool = use_pool
self.headless = headless
self.session_id = session_id # 保存session_id用于并发隔离
self.use_page_isolation = use_page_isolation # 页面隔离模式
self.browser_pool = get_browser_pool(headless=headless) if use_pool else None
self.playwright = None
self.browser: Optional[Browser] = None
@@ -84,17 +96,26 @@ class XHSLoginService:
self.page: Optional[Page] = None
self.current_phone = None
async def init_browser(self, cookies: Optional[list] = None, proxy: Optional[str] = None, user_agent: Optional[str] = None, restore_state: bool = False):
async def init_browser(self, cookies: Optional[list] = None, proxy: Optional[dict] = None, user_agent: Optional[str] = None, restore_state: bool = False, use_random_proxy: bool = True):
"""
初始化浏览器
Args:
cookies: 可选的Cookie列表用于恢复登录状态
proxy: 可选的代理地址,例如 http://user:pass@ip:port
proxy: 可选的代理配置,例如 {"server": "http://ip:port", "username": "...", "password": "..."}
user_agent: 可选的自定义User-Agent
restore_state: 是否从log_state.json文件恢复完整登录状态
use_random_proxy: 是否自动使用随机代理默认True
"""
try:
# 如果没有指定代理且启用自动代理,则使用随机代理
if not proxy and use_random_proxy:
try:
proxy_config = get_random_proxy()
proxy = format_proxy_for_playwright(proxy_config)
logger.info(f"[代理] 自动选择代理: {proxy_config['name']} ({proxy_config['server']})")
except Exception as e:
logger.info(f"[代理] 无可用代理,使用直连访问")
# 如果要求恢复状态,先加载 login_state.json
login_state = None
if restore_state and os.path.exists('login_state.json'):
@@ -112,12 +133,54 @@ class XHSLoginService:
# 使用浏览器池
if self.use_pool and self.browser_pool:
# 扫码登录使用页面隔离模式
if self.use_page_isolation and self.session_id:
print(f"[页面隔离模式] 获取扫码登录页面 (session_id={self.session_id})", file=sys.stderr)
# 获取或创建页面
self.page = await self.browser_pool.get_qrcode_page(self.session_id)
# 使用浏览器池的主浏览器和context
self.browser = self.browser_pool.browser
self.context = self.browser_pool.context
print("浏览器初始化成功(页面隔离模式)", file=sys.stderr)
return
# 普通浏览器池模式
print(f"[浏览器池模式] 从浏览器池获取实例 (session_id={self.session_id}, headless={self.headless})", file=sys.stderr)
self.browser, self.context, self.page = await self.browser_pool.get_browser(
cookies=cookies, proxy=proxy, user_agent=user_agent, session_id=self.session_id,
headless=self.headless # 传递headless参数
)
# 保存proxy配置
if proxy:
self.proxy = proxy
# 检查page状态如果是空白页或已关闭重新创建page
try:
current_url = self.page.url
print(f"当前URL: {current_url}", file=sys.stderr)
if current_url == 'about:blank' or current_url == '':
print("[浏览器池] 检测到空白页面重新创建page", file=sys.stderr)
try:
# 关闭旧page
await self.page.close()
except Exception as e:
print(f"[浏览器池] 关闭旧page失败: {str(e)}", file=sys.stderr)
# 创建新page
self.page = await self.context.new_page()
print(f"[浏览器池] 已创建新page, 新URL: {self.page.url}", file=sys.stderr)
# 更新浏览器池中保存的page引用
if self.session_id and self.session_id in self.browser_pool.temp_browsers:
self.browser_pool.temp_browsers[self.session_id]["page"] = self.page
print("[浏览器池] 已更新浏览器池中的page引用", file=sys.stderr)
except Exception as e:
print(f"[浏览器池] 检查page状态异常: {str(e)}", file=sys.stderr)
# 如果有localStorage/sessionStorage恢复它们
if login_state:
await self._restore_storage(login_state)
@@ -149,7 +212,8 @@ class XHSLoginService:
],
}
if proxy:
launch_kwargs["proxy"] = {"server": proxy}
launch_kwargs["proxy"] = proxy # 直接使用proxy字典
self.proxy = proxy # 保存proxy配置供后续使用
self.browser = await self.playwright.chromium.launch(**launch_kwargs)
@@ -390,13 +454,13 @@ class XHSLoginService:
except Exception as e:
print(f"⚠️ 恢夏storage失败: {str(e)}", file=sys.stderr)
async def init_browser_with_storage_state(self, storage_state_path: str, proxy: Optional[str] = None):
async def init_browser_with_storage_state(self, storage_state_path: str, proxy: Optional[dict] = None):
"""
使用Playwright原生storage_state初始化浏览器最优方案
Args:
storage_state_path: storage_state文件路径
proxy: 可选的代理地址
proxy: 可选的代理配置
"""
try:
if not os.path.exists(storage_state_path):
@@ -424,7 +488,7 @@ class XHSLoginService:
],
}
if proxy:
launch_kwargs["proxy"] = {"server": proxy}
launch_kwargs["proxy"] = proxy # 直接使用proxy字典
self.browser = await self.playwright.chromium.launch(**launch_kwargs)
@@ -574,6 +638,71 @@ class XHSLoginService:
print(f"⚠️ 提取二维码失败: {str(e)}", file=sys.stderr)
return None
async def _navigate_with_qrcode_listener(self, url: str, timeout: int = 120):
"""
带有二维码API监听的页面导航
通过监听https://edith.xiaohongshu.com/api/sns/web/v1/login/qrcode/create
来判断登录框是否已加载完成,而不是等待固定时间
Args:
url: 目标URL
timeout: 最大等待时间默认120秒
"""
qrcode_api_detected = False
# 设置路由监听二维码创建API
async def handle_qrcode_create(route):
nonlocal qrcode_api_detected
try:
request = route.request
logger.info(f"[页面导航] 监听到二维码API请求: {request.url}")
qrcode_api_detected = True
# 继续请求
await route.continue_()
except Exception as e:
logger.error(f"[页面导航] 处理二维码API请求失败: {str(e)}")
await route.continue_()
try:
# 注册路由监听
await self.page.route('**/api/sns/web/v1/login/qrcode/create', handle_qrcode_create)
logger.info(f"[页面导航] 已注册二维码API监听")
# 开始导航,不等待加载完成
try:
await self.page.goto(url, wait_until='commit', timeout=timeout * 1000)
logger.info(f"[页面导航] 已开始导航到 {url}")
except Exception as e:
# 即使超时也继续只要URL匹配
current_url = self.page.url
logger.warning(f"[页面导航] 导航超时,但尝试继续: {str(e)}")
logger.info(f"[页面导航] 当前URL: {current_url}")
# 等待二维码API请求最多等待timeout秒
wait_count = 0
max_wait = timeout * 10 # 每次等待0.1秒
while not qrcode_api_detected and wait_count < max_wait:
await asyncio.sleep(0.1)
wait_count += 1
if qrcode_api_detected:
logger.success(f"[页面导航] 监听到二维码API请求登录框已加载完成耗时{wait_count * 0.1:.1f}秒)")
else:
logger.warning(f"[页面导航] {timeout}秒内未监听到二维码API请求尝试继续")
# 额外等待500ms确保元素渲染完成
await asyncio.sleep(0.5)
finally:
# 移除路由监听
try:
await self.page.unroute('**/api/sns/web/v1/login/qrcode/create')
logger.info(f"[页面导航] 已移除二维码API监听")
except Exception:
pass
async def send_verification_code(self, phone: str, country_code: str = "+86", login_page: str = "creator") -> Dict[str, Any]:
"""
发送验证码
@@ -587,7 +716,10 @@ class XHSLoginService:
Dict containing success status and error message if any
"""
try:
logger.info(f"[发送验证码] 开始 - 手机号: {phone}, 登录页面: {login_page}")
if not self.page:
logger.info(f"[发送验证码] 浏览器未初始化,开始初始化...")
await self.init_browser()
self.current_phone = phone
@@ -608,19 +740,39 @@ class XHSLoginService:
else:
# 页面变了,重新访问登录页
print(f"[预热] 页面已变更 ({current_url}),重新访问{page_name}登录页...", file=sys.stderr)
await self.page.goto(login_url, wait_until='networkidle', timeout=30000)
await asyncio.sleep(0.5)
await self._navigate_with_qrcode_listener(login_url)
else:
# 未预热或不是池模式,正常访问页面
# 未预热或不是池模式,使用监听机制访问页面
print(f"正在访问{page_name}登录页...", file=sys.stderr)
# 优化超时时间缩短到30秒使用networkidle提升加载速度
try:
await self.page.goto(login_url, wait_until='networkidle', timeout=30000)
print("✅ 页面加载完成", file=sys.stderr)
except Exception as e:
print(f"页面加载超时,尝试继续: {str(e)}", file=sys.stderr)
# 超时后等待500ms让关键元素加载
await asyncio.sleep(0.5)
# 先验证代理IP如果配置了代理
if hasattr(self, 'proxy') and self.proxy:
try:
print(f"[代理验证] 配置的代理: {self.proxy.get('server', '未知')}", file=sys.stderr)
print(f"[代理验证] 正在访问 IP 查询网站...", file=sys.stderr)
await self.page.goto('https://httpbin.org/ip', timeout=15000)
ip_info = await self.page.locator('body').inner_text()
print(f"[代理验证] 当前 IP 信息:\n{ip_info}", file=sys.stderr)
# 简单解析IP地址
import json
try:
ip_data = json.loads(ip_info)
current_ip = ip_data.get('origin', '未知')
proxy_host = self.proxy.get('server', '').split('://')[-1].split(':')[0]
if proxy_host in current_ip or current_ip in self.proxy.get('server', ''):
print(f"[代理验证] ✅ 代理生效当前IP: {current_ip}", file=sys.stderr)
else:
print(f"[代理验证] ⚠️ 当前IP ({current_ip}) 与代理IP ({proxy_host}) 不匹配", file=sys.stderr)
except:
print(f"[代理验证] IP信息: {ip_info}", file=sys.stderr)
except Exception as e:
print(f"[代理验证] 验证失败: {str(e)}", file=sys.stderr)
else:
print(f"[代理验证] 未配置代理使用本机IP", file=sys.stderr)
await self._navigate_with_qrcode_listener(login_url)
print(f"✅ 已进入{page_name}登录页面", file=sys.stderr)
@@ -850,14 +1002,23 @@ class XHSLoginService:
]
# 直接查找,不重试
send_code_btn = None
send_code_selector = None
for selector in selectors:
send_code_btn = await self.page.query_selector(selector)
if send_code_btn:
print(f"✅ 找到发送验证码按钮: {selector}", file=sys.stderr)
send_code_selector = selector
break
if send_code_btn:
if send_code_selector:
# 重新获取元素句柄以确保其有效性
send_code_btn = await self.page.query_selector(send_code_selector)
if not send_code_btn:
return {
"success": False,
"error": "按钮元素已失效,请重试"
}
# 获取按钮文本内容
btn_text = await send_code_btn.inner_text()
btn_text = btn_text.strip() if btn_text else ""
@@ -892,9 +1053,20 @@ class XHSLoginService:
}
print(f"✅ 按钮已激活: class={class_name}", file=sys.stderr)
# 点击按钮
await send_code_btn.click()
print("✅ 已点击发送验证码", file=sys.stderr)
# 点击前再次确保元素有效页面DOM可能在检查过程中更新
try:
# 使用 page.click 直接通过选择器点击,避免元素句柄失效问题
await self.page.click(send_code_selector, timeout=5000)
print("✅ 已点击发送验证码", file=sys.stderr)
except Exception as click_error:
# 如果直接点击失败,尝试重新获取元素点击
print(f"⚠️ 直接点击失败: {str(click_error)}, 尝试重新获取元素", file=sys.stderr)
send_code_btn = await self.page.query_selector(send_code_selector)
if send_code_btn:
await send_code_btn.click()
print("✅ 重新获取元素后点击成功", file=sys.stderr)
else:
raise Exception("按钮元素已失效,无法点击")
# 等待页面响应,检测是否出现验证二维码
await asyncio.sleep(1.5)
@@ -924,6 +1096,7 @@ class XHSLoginService:
}
# 直接返回成功,不再检测滑块
logger.info(f"[发送验证码] 成功 - 手机号: {phone}")
print("\n✅ 验证码发送流程完成,请查看手机短信", file=sys.stderr)
print("请在小程序中输入收到的验证码并点击登录\n", file=sys.stderr)
print("[响应即将返回] success=True, message=验证码发送成功", file=sys.stderr)
@@ -951,6 +1124,7 @@ class XHSLoginService:
except Exception as e:
error_msg = str(e)
logger.error(f"[发送验证码] 异常 - 手机号: {phone}, 错误: {error_msg}")
print(f"\n❌ 发送验证码异常: {error_msg}", file=sys.stderr)
print(f"当前页面URL: {self.page.url if self.page else 'N/A'}", file=sys.stderr)
@@ -2519,3 +2693,649 @@ class XHSLoginService:
"success": False,
"error": str(e)
}
async def start_qrcode_login(self, login_page: str = "home") -> Dict[str, Any]:
"""
启动小红书首页的扫码登录流程
Args:
login_page: 登录页面类型默认home(小红书首页)
Returns:
Dict containing qrcode image and status
"""
try:
if not self.page:
await self.init_browser()
# 访问小红书首页
login_url = 'https://www.xiaohongshu.com'
logger.info(f"[扫码登录] 正在访问小红书首页...")
# 强制访问首页,不管当前在哪个页面
try:
# 使用domcontentloaded而不是networkidle避免等待所有资源加载
await self.page.goto(login_url, wait_until='domcontentloaded', timeout=10000)
current_url = self.page.url
logger.success(f"[扫码登录] 页面加载完成, 当前URL: {current_url}")
# 检查是否跳转到验证码页面
if '/website-login/captcha' in current_url or 'verifyUuid=' in current_url:
logger.warning(f"[扫码登录] 检测到风控验证页面,尝试等待或跳过...")
# 等待30秒看是否会自动跳过
await asyncio.sleep(30)
current_url = self.page.url
logger.info(f"[扫码登录] 等待30秒后当前URL: {current_url}")
# 如果还在验证码页面,返回错误
if '/website-login/captcha' in current_url or 'verifyUuid=' in current_url:
return {
"success": False,
"error": "当前IP被风控需要验证。请稍后再试或启用代理。"
}
except Exception as e:
# 即使超时也继续,因为页面可能已经跳转到explore
current_url = self.page.url
if 'xiaohongshu.com' in current_url:
logger.warning(f"[扫码登录] 页面加载超时但已到达小红书页面: {current_url}")
else:
logger.error(f"[扫码登录] 页面加载失败: {str(e)}, 当前URL: {current_url}")
raise e
# 🔥 关键修改: 在explore页面后立即注册路由监听被动等待二维码创建
qrcode_create_data = None
# 设置路由监听二维码创建 API
async def handle_qrcode_create(route):
nonlocal qrcode_create_data
try:
request = route.request
logger.info(f"[扫码登录] API请求: {request.method} {request.url}")
response = await route.fetch()
body = await response.body()
try:
data = json.loads(body.decode('utf-8'))
logger.info(f"[扫码登录] API响应: {json.dumps(data, ensure_ascii=False)}")
if data.get('code') == 0 and data.get('success') and data.get('data'):
qrcode_create_data = data.get('data')
logger.success(f"[扫码登录] 获取到二维码 qr_id={qrcode_create_data.get('qr_id')}")
except Exception as e:
logger.error(f"[扫码登录] 解析响应失败: {str(e)}")
await route.fulfill(response=response)
except Exception as e:
logger.error(f"[扫码登录] 处理API请求失败: {str(e)}")
await route.continue_()
# 注册路由 (在explore页面后立即注册)
await self.page.route('**/api/sns/web/v1/login/qrcode/create', handle_qrcode_create)
logger.info("[扫码登录] 已注册 API路由监听等待页面自动触发二维码创建...")
# 被动等待二维码创建 API请求完成(最多等待30秒)
for i in range(300): # 300 * 0.1 = 30秒
if qrcode_create_data:
break
await asyncio.sleep(0.1)
if not qrcode_create_data:
logger.warning("[扫码登录] 30秒内未捕获到二维码创建 API请求尝试从页面提取二维码")
# 提取二维码和状态(但不检测登录成功,因为这是初始化)
qrcode_result = await self.extract_qrcode_with_status(check_login_success=False)
# 如果获取到二维码创建信息,添加到结果中
if qrcode_create_data:
qrcode_result["qr_id"] = qrcode_create_data.get('qr_id')
qrcode_result["qr_code"] = qrcode_create_data.get('code')
qrcode_result["qr_url"] = qrcode_create_data.get('url')
qrcode_result["multi_flag"] = qrcode_create_data.get('multi_flag')
return qrcode_result
except Exception as e:
print(f"启动扫码登录失败: {str(e)}", file=sys.stderr)
return {
"success": False,
"error": str(e)
}
async def extract_qrcode_with_status(self, check_login_success: bool = True) -> Dict[str, Any]:
"""
提取二维码图片和状态信息,并检测是否扫码成功
Args:
check_login_success: 是否检测登录成功默认True。start_qrcode_login时传False
Returns:
Dict containing qrcode image, status text, login success and user data
"""
try:
if not self.page:
return {
"success": False,
"error": "浏览器未初始化"
}
result = {
"success": True,
"qrcode_image": "",
"status_text": "",
"status_desc": "",
"is_expired": False,
"login_success": False, # 新增:是否扫码登录成功
"user_info": None,
"cookies": None,
"cookies_full": None,
"login_state": None
}
# 只有在轮询检查时才判断登录成功
if check_login_success:
# 方法1: 监听用户信息API请求(最准确的方式)
user_me_data = None
try:
# 直接请求用户信息API
response = await self.page.evaluate('''
async () => {
try {
const response = await fetch('https://edith.xiaohongshu.com/api/sns/web/v2/user/me', {
method: 'GET',
credentials: 'include'
});
const data = await response.json();
return data;
} catch (error) {
return { error: error.message };
}
}
''')
if response and not response.get('error'):
# 关键修复: 检查是否是游客状态
if response.get('code') == 0 and response.get('success') and response.get('data'):
data = response.get('data')
is_guest = data.get('guest', False)
# 只有非游客状态才算登录成功
if not is_guest and data.get('user_id') and data.get('nickname'):
user_me_data = data
logger.success(f"[扫码登录] 登录成功! user_id={user_me_data.get('user_id')}, nickname={user_me_data.get('nickname')}")
except Exception as e:
logger.error(f"[扫码登录] 请求用户信息 API异常: {str(e)}")
# 如果获取到用户信息,说明登录成功
if user_me_data:
result["login_success"] = True
# 等待页面稳定
await asyncio.sleep(1)
# 获取Cookies
try:
cookies = await self.context.cookies()
cookies_dict = {cookie['name']: cookie['value'] for cookie in cookies}
result["cookies"] = cookies_dict
result["cookies_full"] = cookies
except Exception as e:
logger.error(f"[扫码登录] 获取Cookie失败: {str(e)}")
# 构建用户信息(使用API返回的数据 + localStorage)
try:
# 先从 localStorage 获取基础信息
storage = await self.page.evaluate('() => JSON.stringify(localStorage)')
storage_dict = json.loads(storage)
user_info = {
# 从 API 响应中提取的信息(最准确)
'user_id': user_me_data.get('user_id'),
'red_id': user_me_data.get('red_id'),
'nickname': user_me_data.get('nickname'),
'desc': user_me_data.get('desc'),
'gender': user_me_data.get('gender'),
'avatar_small': user_me_data.get('images'), # 小头像
'avatar_large': user_me_data.get('imageb'), # 大头像
'is_guest': user_me_data.get('guest', False)
}
# 补充 localStorage 中的其他信息
useful_keys = ['b1', 'b1b1', 'p1']
for key in useful_keys:
if key in storage_dict:
try:
value = storage_dict[key]
if value and value.strip():
user_info[key] = json.loads(value) if value.startswith('{') or value.startswith('[') else value
except:
user_info[key] = storage_dict[key]
result["user_info"] = user_info
except Exception as e:
logger.error(f"[扫码登录] 构建用户信息失败: {str(e)}")
# 即\u4f7f失\u8d25,\u4e5f\u4f7f\u7528API\u8fd4\u56de\u7684\u6570\u636e
result["user_info"] = {
'user_id': user_me_data.get('user_id'),
'red_id': user_me_data.get('red_id'),
'nickname': user_me_data.get('nickname'),
'desc': user_me_data.get('desc'),
'gender': user_me_data.get('gender'),
'avatar_small': user_me_data.get('images'),
'avatar_large': user_me_data.get('imageb'),
'is_guest': user_me_data.get('guest', False)
}
# 获取完整的登录状态
try:
current_url = self.page.url
localStorage_data = {}
sessionStorage_data = {}
try:
storage = await self.page.evaluate('() => JSON.stringify(localStorage)')
localStorage_data = json.loads(storage)
except Exception as e:
print(f"\u26a0\ufe0f \u83b7\u53d6localStorage\u5931\u8d25: {str(e)}", file=sys.stderr)
try:
session_storage = await self.page.evaluate('() => JSON.stringify(sessionStorage)')
sessionStorage_data = json.loads(session_storage)
except Exception as e:
print(f"\u26a0\ufe0f \u83b7\u53d6sessionStorage\u5931\u8d25: {str(e)}", file=sys.stderr)
result["login_state"] = {
"cookies": result["cookies_full"],
"localStorage": localStorage_data,
"sessionStorage": sessionStorage_data,
"url": current_url,
"timestamp": time.time()
}
print("\u2705 \u5df2\u6784\u5efa\u5b8c\u6574\u767b\u5f55\u72b6\u6001", file=sys.stderr)
except Exception as e:
print(f"\u26a0\ufe0f \u6784\u5efa\u767b\u5f55\u72b6\u6001\u5931\u8d25: {str(e)}", file=sys.stderr)
return result
# 如果API请求失败,退而求其次使用页面元素检测
print("\u26a0\ufe0f API\u68c0\u6d4b\u5931\u8d25,\u4f7f\u7528\u9875\u9762\u5143\u7d20\u68c0\u6d4b", file=sys.stderr)
current_url = self.page.url
print(f"\u5f53\u524dURL: {current_url}", file=sys.stderr)
# 方法2: 检查\u4e8c\u7ef4\u7801\u662f\u5426\u8fd8\u5728(如\u679c\u4e8c\u7ef4\u7801\u6d88\u5931\u4e86,\u8bf4\u660e\u53ef\u80fd\u767b\u5f55\u4e86)
qrcode_exists = False
try:
qrcode_img = await self.page.query_selector('.qrcode-img')
if qrcode_img:
qrcode_exists = await qrcode_img.is_visible()
except Exception:
pass
# 方法3: 检查\u767b\u5f55\u5f39\u7a97\u662f\u5426\u5173\u95ed
login_modal_closed = True
try:
modal_selectors = [
'.login-container',
'.reds-modal',
'[class*="login-modal"]',
'[class*="LoginModal"]',
]
for selector in modal_selectors:
modal = await self.page.query_selector(selector)
if modal and await modal.is_visible():
login_modal_closed = False
break
except Exception:
pass
# 方法4: 检查\u662f\u5426\u6709\u767b\u5f55\u540e\u7684\u7528\u6237\u4fe1\u606f\u5143\u7d20
has_user_info = False
try:
user_selectors = [
'.user-info',
'.avatar',
'[class*="user"]',
]
for selector in user_selectors:
user_el = await self.page.query_selector(selector)
if user_el and await user_el.is_visible():
has_user_info = True
break
except Exception:
pass
print(f"\u767b\u5f55\u72b6\u6001\u68c0\u6d4b: \u4e8c\u7ef4\u7801\u5b58\u5728={qrcode_exists}, \u767b\u5f55\u6846\u5173\u95ed={login_modal_closed}, \u6709\u7528\u6237\u4fe1\u606f={has_user_info}", file=sys.stderr)
# 综合\u5224\u65ad: \u4e8c\u7ef4\u7801\u6d88\u5931 \u4e14 (\u767b\u5f55\u6846\u5173\u95ed \u6216 \u6709\u7528\u6237\u4fe1\u606f)
if not qrcode_exists and (login_modal_closed or has_user_info):
print("\u2705 \u68c0\u6d4b\u5230\u626b\u7801\u767b\u5f55\u6210\u529f!(\u4e8c\u7ef4\u7801\u5df2\u6d88\u5931)", file=sys.stderr)
result["login_success"] = True
# 等\u5f85\u9875\u9762\u7a33\u5b9a
await asyncio.sleep(1)
# 获\u53d6Cookies
try:
cookies = await self.context.cookies()
cookies_dict = {cookie['name']: cookie['value'] for cookie in cookies}
result["cookies"] = cookies_dict
result["cookies_full"] = cookies
print(f"\u2705 \u5df2\u83b7\u53d6 {len(cookies)} \u4e2aCookie", file=sys.stderr)
except Exception as e:
print(f"\u26a0\ufe0f \u83b7\u53d6Cookie\u5931\u8d25: {str(e)}", file=sys.stderr)
# 获\u53d6\u7528\u6237\u4fe1\u606f
try:
storage = await self.page.evaluate('() => JSON.stringify(localStorage)')
storage_dict = json.loads(storage)
user_info = {}
useful_keys = ['b1', 'b1b1', 'p1']
for key in useful_keys:
if key in storage_dict:
try:
value = storage_dict[key]
if value and value.strip():
user_info[key] = json.loads(value) if value.startswith('{') or value.startswith('[') else value
except:
user_info[key] = storage_dict[key]
result["user_info"] = user_info
print(f"\u2705 \u5df2\u83b7\u53d6\u7528\u6237\u4fe1\u606f: {list(user_info.keys())}", file=sys.stderr)
except Exception as e:
print(f"\u26a0\ufe0f \u83b7\u53d6\u7528\u6237\u4fe1\u606f\u5931\u8d25: {str(e)}", file=sys.stderr)
# 获\u53d6\u5b8c\u6574\u7684\u767b\u5f55\u72b6\u6001
try:
localStorage_data = {}
sessionStorage_data = {}
try:
storage = await self.page.evaluate('() => JSON.stringify(localStorage)')
localStorage_data = json.loads(storage)
except Exception as e:
print(f"\u26a0\ufe0f \u83b7\u53d6localStorage\u5931\u8d25: {str(e)}", file=sys.stderr)
try:
session_storage = await self.page.evaluate('() => JSON.stringify(sessionStorage)')
sessionStorage_data = json.loads(session_storage)
except Exception as e:
print(f"\u26a0\ufe0f \u83b7\u53d6sessionStorage\u5931\u8d25: {str(e)}", file=sys.stderr)
result["login_state"] = {
"cookies": result["cookies_full"],
"localStorage": localStorage_data,
"sessionStorage": sessionStorage_data,
"url": current_url,
"timestamp": time.time()
}
print("\u2705 \u5df2\u6784\u5efa\u5b8c\u6574\u767b\u5f55\u72b6\u6001", file=sys.stderr)
except Exception as e:
print(f"\u26a0\ufe0f \u6784\u5efa\u767b\u5f55\u72b6\u6001\u5931\u8d25: {str(e)}", file=sys.stderr)
return result
# 还在登录页或不检查登录状态,继续提取二维码和状态
# 提取二维码图片
qrcode_selectors = [
'.qrcode-img',
'img.qrcode-img',
'.qrcode img',
'img[src*="data:image"]',
'img[alt*="二维码"]',
]
for selector in qrcode_selectors:
try:
qrcode_img = await self.page.wait_for_selector(selector, timeout=3000)
if qrcode_img:
# 获取src属性
src = await qrcode_img.get_attribute('src')
if src:
if src.startswith('data:image'):
result["qrcode_image"] = src
else:
# 如果是URL,尝试下载转换
try:
async with aiohttp.ClientSession() as session:
async with session.get(src, timeout=aiohttp.ClientTimeout(total=10)) as response:
if response.status == 200:
img_data = await response.read()
import base64
img_base64 = base64.b64encode(img_data).decode('utf-8')
content_type = response.headers.get('Content-Type', 'image/png')
result["qrcode_image"] = f"data:{content_type};base64,{img_base64}"
print("✅ 成功下载并转换二维码", file=sys.stderr)
except Exception as e:
print(f"⚠️ 下载二维码失败: {str(e)}", file=sys.stderr)
# 如果还是没有图片,尝试截图
if not result["qrcode_image"]:
try:
screenshot_bytes = await qrcode_img.screenshot()
if screenshot_bytes:
import base64
img_base64 = base64.b64encode(screenshot_bytes).decode('utf-8')
result["qrcode_image"] = f"data:image/png;base64,{img_base64}"
print("✅ 成功截取二维码", file=sys.stderr)
except Exception as e:
print(f"⚠️ 截取二维码失败: {str(e)}", file=sys.stderr)
break
except Exception as e:
continue
if not result["qrcode_image"]:
return {
"success": False,
"error": "未找到二维码图片"
}
# 提取状态信息
print("正在提取二维码状态...", file=sys.stderr)
status_selectors = [
'.status',
'.qrcode-status',
'[class*="status"]',
]
for selector in status_selectors:
try:
status_el = await self.page.query_selector(selector)
if status_el:
# 检查状态是否可见
is_visible = await status_el.is_visible()
if not is_visible:
print("二维码状态元素不可见,说明二维码有效", file=sys.stderr)
result["status_text"] = "" # 空字符串表示正常状态
result["is_expired"] = False
break
print(f"✅ 找到状态元素: {selector}", file=sys.stderr)
# 提取状态文本
status_text_el = await status_el.query_selector('.status-text')
if status_text_el:
status_text = await status_text_el.inner_text()
result["status_text"] = status_text.strip()
print(f"状态文本: {result['status_text']}", file=sys.stderr)
# 提取状态描述
status_desc_el = await status_el.query_selector('.status-desc')
if status_desc_el:
status_desc = await status_desc_el.inner_text()
result["status_desc"] = status_desc.strip()
print(f"状态描述: {result['status_desc']}", file=sys.stderr)
# 判断是否过期
if "过期" in result["status_text"] or "过期" in result["status_desc"]:
result["is_expired"] = True
print("⚠️ 二维码已过期", file=sys.stderr)
break
except Exception as e:
continue
# 如果没有找到状态元素,说明二维码正常(不设置status_text小程序端自己显示)
if not result["status_text"]:
result["status_text"] = "" # 空字符串表示正常状态,小程序端不显示覆盖层
result["is_expired"] = False
print(f"✅ 二维码提取完成: 状态={result['status_text']}, 过期={result['is_expired']}, 登录成功={result['login_success']}", file=sys.stderr)
return result
except Exception as e:
print(f"提取二维码状态失败: {str(e)}", file=sys.stderr)
return {
"success": False,
"error": str(e)
}
async def refresh_qrcode(self) -> Dict[str, Any]:
"""
刷新二维码(当二维码过期时点击刷新)
Returns:
Dict containing new qrcode and status
"""
try:
if not self.page:
return {
"success": False,
"error": "浏览器未初始化"
}
# 检查page状态如果是空白页需要重新导航到登录页
try:
current_url = self.page.url
logger.info(f"[刷新二维码] 当前URL: {current_url}")
if current_url == 'about:blank' or current_url == '':
logger.warning("[刷新二维码] 检测到空白页重新导航到explore页面")
await self.page.goto('https://www.xiaohongshu.com/explore', wait_until='networkidle')
await asyncio.sleep(1)
except Exception as e:
logger.error(f"[刷新二维码] 检查page状态异常: {str(e)}")
logger.info("[刷新二维码] 正在刷新...")
# 🔥 关键修改: 先注册路由监听,然后再打开登录弹窗
qrcode_create_data = None
# 设置路由监听二维码创建 API
async def handle_qrcode_create(route):
nonlocal qrcode_create_data
try:
# 记录请求
request = route.request
logger.info(f"[刷新二维码] API请求: {request.method} {request.url}")
response = await route.fetch()
body = await response.body()
try:
data = json.loads(body.decode('utf-8'))
logger.info(f"[刷新二维码] API响应: {json.dumps(data, ensure_ascii=False)}")
if data.get('code') == 0 and data.get('success') and data.get('data'):
qrcode_create_data = data.get('data')
logger.success(f"[刷新二维码] 获取到新二维码 qr_id={qrcode_create_data.get('qr_id')}")
except Exception as e:
logger.error(f"[刷新二维码] 解析响应失败: {str(e)}")
await route.fulfill(response=response)
except Exception as e:
logger.error(f"[刷新二维码] 处理API请求失败: {str(e)}")
await route.continue_()
# 注册路由 (在打开登录页之前)
await self.page.route('**/api/sns/web/v1/login/qrcode/create', handle_qrcode_create)
logger.info("[刷新二维码] 已注册 API路由监听")
# 确保在登录页面或扫码页面
current_url = self.page.url
if 'login' not in current_url.lower():
# 如果不在登录页,先打开登录页
logger.info("[刷新二维码] 不在登录页,先打开登录页")
try:
login_btn = await self.page.wait_for_selector('text="登录"', timeout=3000)
if login_btn:
await login_btn.click()
await asyncio.sleep(1)
except Exception as e:
logger.warning(f"[刷新二维码] 打开登录页失败: {str(e)}")
# 确保切换到扫码登录选项卡
qrcode_tab_selectors = [
'text="扫码登录"',
'div:has-text("扫码登录")',
'text="二维码登录"',
'div:has-text("二维码登录")',
'.qrcode-tab',
'[data-type="qrcode"]',
]
for selector in qrcode_tab_selectors:
try:
qrcode_tab = await self.page.query_selector(selector)
if qrcode_tab:
logger.info("[刷新二维码] 切换到扫码登录模式")
await qrcode_tab.click()
await asyncio.sleep(0.5)
break
except Exception:
continue
# 查找刷新按钮或刷新文本
refresh_selectors = [
'.status-desc.refresh',
'text="点击刷新"',
'.refresh',
'[class*="refresh"]',
]
refresh_clicked = False
for selector in refresh_selectors:
try:
refresh_el = await self.page.query_selector(selector)
if refresh_el:
logger.info(f"[刷新二维码] 找到刷新按钮: {selector}")
await refresh_el.click()
logger.success("[刷新二维码] 已点击刷新")
await asyncio.sleep(1)
refresh_clicked = True
break
except Exception:
continue
if not refresh_clicked:
return {
"success": False,
"error": "未找到刷新按钮"
}
# 等待二维码创建 API请求完成(最多等待 3 秒)
for i in range(30): # 30 * 0.1 = 3秒
if qrcode_create_data:
break
await asyncio.sleep(0.1)
if not qrcode_create_data:
logger.warning("[刷新二维码] 未捕获到二维码创建 API请求")
# 重新提取二维码
qrcode_result = await self.extract_qrcode_with_status(check_login_success=False)
# 如果获取到二维码创建信息,添加到结果中
if qrcode_create_data:
qrcode_result["qr_id"] = qrcode_create_data.get('qr_id')
qrcode_result["qr_code"] = qrcode_create_data.get('code')
qrcode_result["qr_url"] = qrcode_create_data.get('url')
qrcode_result["multi_flag"] = qrcode_create_data.get('multi_flag')
logger.success("[刷新二维码] 已将二维码创建信息添加到返回结果")
return qrcode_result
except Exception as e:
logger.error(f"[刷新二维码] 失败: {str(e)}")
return {
"success": False,
"error": str(e)
}