2025-12-2genxin
This commit is contained in:
@@ -1,460 +0,0 @@
|
||||
# 📡 微信公众号文章爬虫 - API 接口文档
|
||||
|
||||
## 服务器信息
|
||||
|
||||
- **服务地址**: http://localhost:8080
|
||||
- **协议**: HTTP/1.1
|
||||
- **数据格式**: JSON
|
||||
- **字符编码**: UTF-8
|
||||
- **CORS**: 已启用(允许所有来源)
|
||||
|
||||
## 统一响应格式
|
||||
|
||||
所有API接口返回格式统一为:
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true, // 请求是否成功
|
||||
"message": "操作成功", // 提示信息
|
||||
"data": {} // 数据内容(可选)
|
||||
}
|
||||
```
|
||||
|
||||
## 接口列表
|
||||
|
||||
### 1. 提取公众号主页
|
||||
|
||||
**接口地址**: `/api/homepage/extract`
|
||||
**请求方法**: POST
|
||||
**功能说明**: 从文章链接中提取公众号主页链接
|
||||
|
||||
#### 请求参数
|
||||
|
||||
```json
|
||||
{
|
||||
"url": "https://mp.weixin.qq.com/s?__biz=xxx&mid=xxx"
|
||||
}
|
||||
```
|
||||
|
||||
| 参数 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| url | string | 是 | 公众号文章链接 |
|
||||
|
||||
#### 响应示例
|
||||
|
||||
**成功响应**:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "提取成功",
|
||||
"data": {
|
||||
"homepage": "https://mp.weixin.qq.com/mp/profile_ext?action=home&__biz=xxx&scene=124",
|
||||
"output": "完整的命令行输出信息"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**失败响应**:
|
||||
```json
|
||||
{
|
||||
"success": false,
|
||||
"message": "未能提取到主页链接"
|
||||
}
|
||||
```
|
||||
|
||||
#### 调用示例
|
||||
|
||||
**jQuery**:
|
||||
```javascript
|
||||
$.ajax({
|
||||
url: 'http://localhost:8080/api/homepage/extract',
|
||||
method: 'POST',
|
||||
contentType: 'application/json',
|
||||
data: JSON.stringify({
|
||||
url: 'https://mp.weixin.qq.com/s?__biz=xxx&mid=xxx'
|
||||
}),
|
||||
success: function(response) {
|
||||
if (response.success) {
|
||||
console.log('主页链接:', response.data.homepage);
|
||||
}
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**curl**:
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/api/homepage/extract \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"url":"https://mp.weixin.qq.com/s?__biz=xxx&mid=xxx"}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. 下载单篇文章
|
||||
|
||||
**接口地址**: `/api/article/download`
|
||||
**请求方法**: POST
|
||||
**功能说明**: 下载指定的单篇文章
|
||||
|
||||
#### 请求参数
|
||||
|
||||
```json
|
||||
{
|
||||
"url": "https://mp.weixin.qq.com/s?__biz=xxx",
|
||||
"save_image": true,
|
||||
"save_content": true
|
||||
}
|
||||
```
|
||||
|
||||
| 参数 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| url | string | 是 | 文章链接 |
|
||||
| save_image | boolean | 否 | 是否保存图片(默认false) |
|
||||
| save_content | boolean | 否 | 是否保存内容(默认true) |
|
||||
|
||||
#### 响应示例
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "下载任务已启动",
|
||||
"data": {
|
||||
"url": "https://mp.weixin.qq.com/s?__biz=xxx"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. 获取文章列表
|
||||
|
||||
**接口地址**: `/api/article/list`
|
||||
**请求方法**: POST
|
||||
**功能说明**: 批量获取公众号的文章列表
|
||||
|
||||
#### 请求参数
|
||||
|
||||
```json
|
||||
{
|
||||
"access_token": "https://mp.weixin.qq.com/mp/profile_ext?action=xxx&appmsg_token=xxx",
|
||||
"pages": 0
|
||||
}
|
||||
```
|
||||
|
||||
| 参数 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| access_token | string | 是 | 包含appmsg_token的URL |
|
||||
| pages | integer | 否 | 获取页数,0表示全部(默认0) |
|
||||
|
||||
#### 响应示例
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "任务已启动"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. 批量下载文章
|
||||
|
||||
**接口地址**: `/api/article/batch`
|
||||
**请求方法**: POST
|
||||
**功能说明**: 批量下载公众号的所有文章
|
||||
|
||||
#### 请求参数
|
||||
|
||||
```json
|
||||
{
|
||||
"official_account": "公众号名称或文章链接",
|
||||
"save_image": true,
|
||||
"save_content": true
|
||||
}
|
||||
```
|
||||
|
||||
| 参数 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| official_account | string | 是 | 公众号名称或任意文章链接 |
|
||||
| save_image | boolean | 否 | 是否保存图片(默认false) |
|
||||
| save_content | boolean | 否 | 是否保存内容(默认true) |
|
||||
|
||||
#### 响应示例
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "任务已启动"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 5. 获取数据列表
|
||||
|
||||
**接口地址**: `/api/data/list`
|
||||
**请求方法**: GET
|
||||
**功能说明**: 获取已下载的公众号数据列表
|
||||
|
||||
#### 请求参数
|
||||
|
||||
无
|
||||
|
||||
#### 响应示例
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"data": [
|
||||
{
|
||||
"name": "研招网资讯",
|
||||
"article_count": 125,
|
||||
"path": "D:\\workspace\\Access_wechat_article\\backend\\data\\研招网资讯",
|
||||
"last_update": "2025-11-27"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
| 字段 | 类型 | 说明 |
|
||||
|------|------|------|
|
||||
| name | string | 公众号名称 |
|
||||
| article_count | integer | 文章数量 |
|
||||
| path | string | 存储路径 |
|
||||
| last_update | string | 最后更新时间 |
|
||||
|
||||
#### 调用示例
|
||||
|
||||
**jQuery**:
|
||||
```javascript
|
||||
$.get('http://localhost:8080/api/data/list', function(response) {
|
||||
if (response.success) {
|
||||
console.log('数据列表:', response.data);
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**curl**:
|
||||
```bash
|
||||
curl http://localhost:8080/api/data/list
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 6. 获取任务状态
|
||||
|
||||
**接口地址**: `/api/task/status`
|
||||
**请求方法**: GET
|
||||
**功能说明**: 获取当前任务的执行状态
|
||||
|
||||
#### 请求参数
|
||||
|
||||
无
|
||||
|
||||
#### 响应示例
|
||||
|
||||
**任务运行中**:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"data": {
|
||||
"running": true,
|
||||
"progress": 45,
|
||||
"message": "正在下载第10篇文章..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**无任务运行**:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"data": {
|
||||
"running": false,
|
||||
"progress": 0,
|
||||
"message": ""
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| 字段 | 类型 | 说明 |
|
||||
|------|------|------|
|
||||
| running | boolean | 是否有任务运行中 |
|
||||
| progress | integer | 任务进度(0-100) |
|
||||
| message | string | 任务状态描述 |
|
||||
| error | string | 错误信息(可选) |
|
||||
|
||||
---
|
||||
|
||||
## 错误码说明
|
||||
|
||||
### HTTP状态码
|
||||
|
||||
| 状态码 | 说明 |
|
||||
|--------|------|
|
||||
| 200 | 请求成功 |
|
||||
| 400 | 请求参数错误 |
|
||||
| 500 | 服务器内部错误 |
|
||||
|
||||
### 业务错误码
|
||||
|
||||
所有业务错误通过响应中的 `success` 字段和 `message` 字段返回:
|
||||
|
||||
```json
|
||||
{
|
||||
"success": false,
|
||||
"message": "具体的错误信息"
|
||||
}
|
||||
```
|
||||
|
||||
常见错误信息:
|
||||
|
||||
| 错误信息 | 说明 | 解决方法 |
|
||||
|----------|------|----------|
|
||||
| 请求参数错误 | JSON格式不正确或缺少必填参数 | 检查请求参数格式 |
|
||||
| 执行失败 | 后端程序执行出错 | 查看详细错误信息 |
|
||||
| 未能提取到主页链接 | 文章链接格式错误或解析失败 | 使用有效的文章链接 |
|
||||
| 读取数据目录失败 | data目录不存在或无权限 | 检查目录权限 |
|
||||
|
||||
---
|
||||
|
||||
## 开发指南
|
||||
|
||||
### 本地测试
|
||||
|
||||
1. **启动API服务器**:
|
||||
```bash
|
||||
cd backend\api
|
||||
start_api.bat
|
||||
```
|
||||
|
||||
2. **测试接口**:
|
||||
```bash
|
||||
# 测试提取主页
|
||||
curl -X POST http://localhost:8080/api/homepage/extract \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{\"url\":\"文章链接\"}"
|
||||
|
||||
# 测试获取数据列表
|
||||
curl http://localhost:8080/api/data/list
|
||||
```
|
||||
|
||||
### 跨域配置
|
||||
|
||||
API服务器已启用CORS,允许所有来源访问:
|
||||
|
||||
```go
|
||||
w.Header().Set("Access-Control-Allow-Origin", "*")
|
||||
w.Header().Set("Access-Control-Allow-Methods", "GET, POST, OPTIONS")
|
||||
w.Header().Set("Access-Control-Allow-Headers", "Content-Type")
|
||||
```
|
||||
|
||||
如需限制特定域名,修改 `server.go` 中的 `corsMiddleware` 函数。
|
||||
|
||||
### 超时设置
|
||||
|
||||
默认HTTP超时时间:30秒
|
||||
|
||||
如需修改,在 `server.go` 中添加:
|
||||
|
||||
```go
|
||||
server := &http.Server{
|
||||
Addr: ":8080",
|
||||
ReadTimeout: 30 * time.Second,
|
||||
WriteTimeout: 30 * time.Second,
|
||||
}
|
||||
```
|
||||
|
||||
### 日志记录
|
||||
|
||||
API服务器使用标准输出记录日志:
|
||||
|
||||
```go
|
||||
log.Printf("[%s] %s - %s", r.Method, r.URL.Path, message)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 接口更新计划
|
||||
|
||||
### v1.1.0(计划中)
|
||||
- [ ] 添加用户认证机制
|
||||
- [ ] 支持任务队列管理
|
||||
- [ ] 增加下载进度推送(WebSocket)
|
||||
- [ ] 提供文章搜索接口
|
||||
|
||||
### v1.2.0(计划中)
|
||||
- [ ] 数据统计分析接口
|
||||
- [ ] 导出功能(PDF/Word)
|
||||
- [ ] 批量任务管理
|
||||
- [ ] 定时任务支持
|
||||
|
||||
---
|
||||
|
||||
## 技术栈
|
||||
|
||||
- **语言**: Go 1.20+
|
||||
- **Web框架**: net/http (标准库)
|
||||
- **数据格式**: JSON
|
||||
- **并发模型**: Goroutine
|
||||
|
||||
---
|
||||
|
||||
## 性能说明
|
||||
|
||||
### 并发能力
|
||||
- 支持多客户端同时访问
|
||||
- 但同一时间只能执行一个爬虫任务(`currentTask`)
|
||||
|
||||
### 资源占用
|
||||
- CPU: 低(主要I/O操作)
|
||||
- 内存: <50MB
|
||||
- 磁盘: 取决于下载的文章数量
|
||||
|
||||
### 性能优化建议
|
||||
1. 使用连接池管理HTTP请求
|
||||
2. 实现任务队列机制
|
||||
3. 添加结果缓存
|
||||
4. 启用gzip压缩
|
||||
|
||||
---
|
||||
|
||||
## 安全建议
|
||||
|
||||
### 1. 生产环境部署
|
||||
- 添加HTTPS支持
|
||||
- 实现API认证(JWT/OAuth)
|
||||
- 限制跨域来源
|
||||
- 添加请求频率限制
|
||||
|
||||
### 2. 数据安全
|
||||
- 不要暴露敏感信息(Cookie)
|
||||
- 定期清理临时文件
|
||||
- 备份重要数据
|
||||
|
||||
### 3. 访问控制
|
||||
- 添加IP白名单
|
||||
- 实现用户权限管理
|
||||
- 记录操作日志
|
||||
|
||||
---
|
||||
|
||||
## 常见问题
|
||||
|
||||
### Q1: 为什么任务启动后没有响应?
|
||||
A: 检查后端 `wechat-crawler.exe` 是否存在并有执行权限。
|
||||
|
||||
### Q2: 如何查看详细的错误信息?
|
||||
A: 查看API服务器窗口的控制台输出。
|
||||
|
||||
### Q3: 能同时执行多个下载任务吗?
|
||||
A: 当前版本不支持,同时只能执行一个任务。
|
||||
|
||||
### Q4: 如何停止正在运行的任务?
|
||||
A: 关闭API服务器窗口或重启服务器。
|
||||
|
||||
---
|
||||
|
||||
**文档版本**: v1.0.0
|
||||
**最后更新**: 2025-11-27
|
||||
**维护者**: AI Assistant
|
||||
BIN
backend/api/api-server.exe
Normal file
BIN
backend/api/api-server.exe
Normal file
Binary file not shown.
BIN
backend/api/api-server.exe~
Normal file
BIN
backend/api/api-server.exe~
Normal file
Binary file not shown.
@@ -5,8 +5,8 @@ echo 📦 编译 API 服务器
|
||||
echo ===============================================
|
||||
echo.
|
||||
|
||||
echo 🔨 正在编译 api_server.exe...
|
||||
go build -o api_server.exe server.go
|
||||
echo 🔨 正在编译 api-server.exe...
|
||||
go build -o api-server.exe server.go
|
||||
|
||||
if %errorlevel% neq 0 (
|
||||
echo.
|
||||
@@ -18,7 +18,7 @@ if %errorlevel% neq 0 (
|
||||
|
||||
echo.
|
||||
echo ✅ 编译成功!
|
||||
echo 📁 输出文件: api_server.exe
|
||||
echo 📁 输出文件: api-server.exe
|
||||
echo.
|
||||
echo ===============================================
|
||||
echo 编译完成
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -1,15 +1,21 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"crypto/rand"
|
||||
"encoding/hex"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"log"
|
||||
"net/http"
|
||||
"net/url"
|
||||
"os"
|
||||
"os/exec"
|
||||
"path/filepath"
|
||||
"regexp"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/wechat-crawler/pkg/wechat"
|
||||
)
|
||||
|
||||
// Response 统一响应结构
|
||||
@@ -17,6 +23,7 @@ type Response struct {
|
||||
Success bool `json:"success"`
|
||||
Message string `json:"message"`
|
||||
Data interface{} `json:"data,omitempty"`
|
||||
Code int `json:"code,omitempty"`
|
||||
}
|
||||
|
||||
// 任务状态
|
||||
@@ -27,7 +34,28 @@ type TaskStatus struct {
|
||||
Error string `json:"error,omitempty"`
|
||||
}
|
||||
|
||||
// 用户登录请求
|
||||
type LoginRequest struct {
|
||||
Username string `json:"username"`
|
||||
Password string `json:"password"`
|
||||
}
|
||||
|
||||
// 用户注册请求
|
||||
type RegisterRequest struct {
|
||||
Username string `json:"username"`
|
||||
Password string `json:"password"`
|
||||
Email string `json:"email"`
|
||||
}
|
||||
|
||||
// Session存储
|
||||
type Session struct {
|
||||
Token string
|
||||
UserID int
|
||||
Expiry time.Time
|
||||
}
|
||||
|
||||
var currentTask = &TaskStatus{Running: false}
|
||||
var sessions = make(map[string]*Session)
|
||||
|
||||
func main() {
|
||||
// 启用CORS
|
||||
@@ -36,10 +64,18 @@ func main() {
|
||||
http.HandleFunc("/api/article/download", corsMiddleware(downloadArticleHandler))
|
||||
http.HandleFunc("/api/article/list", corsMiddleware(getArticleListHandler))
|
||||
http.HandleFunc("/api/article/batch", corsMiddleware(batchDownloadHandler))
|
||||
http.HandleFunc("/api/article/detail", corsMiddleware(getArticleDetailHandler))
|
||||
http.HandleFunc("/api/data/list", corsMiddleware(getDataListHandler))
|
||||
http.HandleFunc("/api/task/status", corsMiddleware(getTaskStatusHandler))
|
||||
http.HandleFunc("/api/download/", corsMiddleware(downloadFileHandler))
|
||||
|
||||
// 用户认证接口
|
||||
http.HandleFunc("/api/user/register", corsMiddleware(registerHandler))
|
||||
http.HandleFunc("/api/user/login", corsMiddleware(loginHandler))
|
||||
http.HandleFunc("/api/user/logout", corsMiddleware(logoutHandler))
|
||||
http.HandleFunc("/api/user/info", corsMiddleware(getUserInfoHandler))
|
||||
http.HandleFunc("/api/user/update", corsMiddleware(updateUserHandler))
|
||||
|
||||
port := ":8080"
|
||||
fmt.Println("===============================================")
|
||||
fmt.Println(" 🚀 微信公众号文章爬虫 API 服务器")
|
||||
@@ -58,7 +94,7 @@ func corsMiddleware(next http.HandlerFunc) http.HandlerFunc {
|
||||
return func(w http.ResponseWriter, r *http.Request) {
|
||||
w.Header().Set("Access-Control-Allow-Origin", "*")
|
||||
w.Header().Set("Access-Control-Allow-Methods", "GET, POST, OPTIONS")
|
||||
w.Header().Set("Access-Control-Allow-Headers", "Content-Type")
|
||||
w.Header().Set("Access-Control-Allow-Headers", "Content-Type, Authorization")
|
||||
|
||||
if r.Method == "OPTIONS" {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
@@ -98,6 +134,9 @@ func handleRoot(w http.ResponseWriter, r *http.Request) {
|
||||
<div class="endpoint">
|
||||
<span class="method">POST</span> /api/article/list - 获取文章列表
|
||||
</div>
|
||||
<div class="endpoint">
|
||||
<span class="method">POST</span> /api/article/detail - 获取文章详情(阅读量、点赞数、评论等)
|
||||
</div>
|
||||
<div class="endpoint">
|
||||
<span class="method">POST</span> /api/article/batch - 批量下载文章
|
||||
</div>
|
||||
@@ -216,12 +255,12 @@ func getArticleListHandler(w http.ResponseWriter, r *http.Request) {
|
||||
currentTask.Progress = 0
|
||||
currentTask.Message = "正在获取文章列表..."
|
||||
|
||||
// 同步执行爬虫程序(功能3)
|
||||
// 同步执行爬虫程序(功能2:获取文章列表)
|
||||
exePath := filepath.Join("..", "wechat-crawler.exe")
|
||||
absPath, _ := filepath.Abs(exePath)
|
||||
workDir, _ := filepath.Abs("..")
|
||||
|
||||
log.Printf("启动功能3: %s, 工作目录: %s", absPath, workDir)
|
||||
log.Printf("启动功能2: %s, 工作目录: %s", absPath, workDir)
|
||||
cmd := exec.Command(absPath)
|
||||
cmd.Dir = workDir
|
||||
|
||||
@@ -242,8 +281,8 @@ func getArticleListHandler(w http.ResponseWriter, r *http.Request) {
|
||||
return
|
||||
}
|
||||
|
||||
// 发送选项"3"(功能3:通过access_token获取文章列表)
|
||||
fmt.Fprintln(stdin, "3")
|
||||
// 发送选项"2"(功能2:通过access_token获取文章列表)
|
||||
fmt.Fprintln(stdin, "2")
|
||||
fmt.Fprintln(stdin, req.AccessToken)
|
||||
if req.Pages > 0 {
|
||||
fmt.Fprintf(stdin, "%d\n", req.Pages)
|
||||
@@ -445,6 +484,304 @@ func batchDownloadHandler(w http.ResponseWriter, r *http.Request) {
|
||||
})
|
||||
}
|
||||
|
||||
// 获取文章详情(功能4:包括阅读量、点赞数、评论等)
|
||||
func getArticleDetailHandler(w http.ResponseWriter, r *http.Request) {
|
||||
var req struct {
|
||||
AccessToken string `json:"access_token"`
|
||||
Pages int `json:"pages"`
|
||||
}
|
||||
|
||||
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
||||
log.Printf("❌ 解析请求失败: %v", err)
|
||||
writeJSON(w, Response{Success: false, Message: "请求参数错误: " + err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
if req.AccessToken == "" {
|
||||
log.Printf("❌ Access Token 为空")
|
||||
writeJSON(w, Response{Success: false, Message: "请输入Access Token URL"})
|
||||
return
|
||||
}
|
||||
|
||||
log.Printf("\n" + strings.Repeat("=", 60))
|
||||
log.Printf("📊 开始获取文章详情功能")
|
||||
log.Printf("接收到的 Access Token: %s", req.AccessToken[:min(100, len(req.AccessToken))])
|
||||
log.Printf("获取页数: %d (0表示全部)", req.Pages)
|
||||
|
||||
currentTask.Running = true
|
||||
currentTask.Progress = 0
|
||||
currentTask.Message = "正在解析Access Token参数..."
|
||||
|
||||
// 从Access Token URL中提取参数
|
||||
params, err := parseAccessToken(req.AccessToken)
|
||||
if err != nil {
|
||||
log.Printf("❌ 解析Access Token失败: %v", err)
|
||||
currentTask.Running = false
|
||||
writeJSON(w, Response{Success: false, Message: "Access Token 参数格式错误: " + err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
log.Printf("✅ 参数解析成功:")
|
||||
log.Printf(" - biz: %s", params["biz"][:min(20, len(params["biz"]))])
|
||||
log.Printf(" - uin: %s", params["uin"])
|
||||
log.Printf(" - key: %s", params["key"][:min(20, len(params["key"]))])
|
||||
log.Printf(" - pass_ticket: %s", params["pass_ticket"][:min(20, len(params["pass_ticket"]))])
|
||||
|
||||
// 创建爬虫实例
|
||||
log.Printf("🔧 创建爬虫实例...")
|
||||
crawler, err := wechat.NewWechatCrawler(
|
||||
params["biz"],
|
||||
params["uin"],
|
||||
params["key"],
|
||||
params["pass_ticket"],
|
||||
nil,
|
||||
)
|
||||
if err != nil {
|
||||
log.Printf("❌ 创建爬虫实例失败: %v", err)
|
||||
currentTask.Running = false
|
||||
writeJSON(w, Response{Success: false, Message: "创建爬虫实例失败: " + err.Error()})
|
||||
return
|
||||
}
|
||||
log.Printf("✅ 爬虫实例创建成功")
|
||||
|
||||
currentTask.Progress = 20
|
||||
currentTask.Message = "正在获取公众号名称..."
|
||||
|
||||
// 获取公众号名称
|
||||
log.Printf("📱 获取公众号名称...")
|
||||
officialName, err := crawler.GetOfficialAccountName()
|
||||
if err != nil {
|
||||
log.Printf("❌ 获取公众号名称失败: %v", err)
|
||||
currentTask.Running = false
|
||||
writeJSON(w, Response{Success: false, Message: "获取公众号名称失败: " + err.Error()})
|
||||
return
|
||||
}
|
||||
log.Printf("✅ 公众号名称: %s", officialName)
|
||||
|
||||
currentTask.Progress = 40
|
||||
currentTask.Message = "正在获取文章列表..."
|
||||
|
||||
// 获取文章列表
|
||||
log.Printf("📋 获取文章列表...")
|
||||
var articleList [][]string
|
||||
|
||||
if req.Pages > 0 {
|
||||
// 只获取指定页数
|
||||
log.Printf("📄 限制获取前 %d 页", req.Pages)
|
||||
for offset := 0; offset < req.Pages; offset++ {
|
||||
result, e := crawler.GetNextList(offset)
|
||||
if e != nil {
|
||||
log.Printf("❌ 获取第 %d 页失败: %v", offset+1, e)
|
||||
err = e
|
||||
break
|
||||
}
|
||||
|
||||
// 检查是否有数据
|
||||
mFlag, ok := result["m_flag"].(int)
|
||||
if !ok {
|
||||
if mFlagFloat, ok := result["m_flag"].(float64); ok {
|
||||
mFlag = int(mFlagFloat)
|
||||
}
|
||||
}
|
||||
if mFlag == 0 {
|
||||
log.Printf("ℹ️ 第 %d 页无更多数据", offset+1)
|
||||
break
|
||||
}
|
||||
|
||||
// 获取当前页的文章列表
|
||||
log.Printf("📝 尝试从 result 中提取 passage_list...")
|
||||
|
||||
// 先尝试 [][]string 类型(GetNextList 实际返回的类型)
|
||||
if passageListStr, ok := result["passage_list"].([][]string); ok {
|
||||
log.Printf("✅ passage_list 提取成功([][]string),包含 %d 个元素", len(passageListStr))
|
||||
for idx, strArr := range passageListStr {
|
||||
articleList = append(articleList, strArr)
|
||||
log.Printf("✅ 添加第 %d 篇文章: %v", idx+1, strArr)
|
||||
}
|
||||
} else if passageList, ok := result["passage_list"].([]interface{}); ok {
|
||||
// 备用:尝试 []interface{} 类型
|
||||
log.Printf("✅ passage_list 提取成功([]interface{}),包含 %d 个元素", len(passageList))
|
||||
for idx, item := range passageList {
|
||||
if arr, ok := item.([]interface{}); ok {
|
||||
strArr := make([]string, len(arr))
|
||||
for i, v := range arr {
|
||||
if s, ok := v.(string); ok {
|
||||
strArr[i] = s
|
||||
}
|
||||
}
|
||||
articleList = append(articleList, strArr)
|
||||
log.Printf("✅ 添加第 %d 篇文章: %v", idx+1, strArr)
|
||||
} else {
|
||||
log.Printf("❌ 第 %d 个 item 不是 []interface{} 类型,实际类型: %T", idx+1, item)
|
||||
}
|
||||
}
|
||||
} else {
|
||||
log.Printf("❌ passage_list 类型断言失败,实际类型: %T", result["passage_list"])
|
||||
}
|
||||
|
||||
log.Printf("✅ 已获取第 %d/%d 页,当前累计 %d 篇文章", offset+1, req.Pages, len(articleList))
|
||||
|
||||
// 添加延迟
|
||||
if offset < req.Pages-1 {
|
||||
time.Sleep(2 * time.Second)
|
||||
}
|
||||
}
|
||||
|
||||
// 转换链接
|
||||
log.Printf("🔗 转换文章链接...转换前共 %d 篇", len(articleList))
|
||||
articleList = crawler.TransformLinks(articleList)
|
||||
log.Printf("✅ 链接转换完成,共 %d 篇文章", len(articleList))
|
||||
} else {
|
||||
// 获取全部文章
|
||||
log.Printf("📄 获取全部文章")
|
||||
articleList, err = crawler.GetArticleList()
|
||||
}
|
||||
if err != nil {
|
||||
log.Printf("❌ 获取文章列表失败: %v", err)
|
||||
currentTask.Running = false
|
||||
writeJSON(w, Response{Success: false, Message: "获取文章列表失败: " + err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
if len(articleList) == 0 {
|
||||
log.Printf("⚠️ 文章列表为空")
|
||||
currentTask.Running = false
|
||||
writeJSON(w, Response{Success: false, Message: "公众号文章列表为空,可能是 Access Token 无效或公众号无文章"})
|
||||
return
|
||||
}
|
||||
|
||||
log.Printf("✅ 获取到 %d 篇文章", len(articleList))
|
||||
|
||||
currentTask.Progress = 60
|
||||
currentTask.Message = fmt.Sprintf("正在获取文章详情 (0/%d)...", len(articleList))
|
||||
|
||||
// 创建保存目录
|
||||
dataDir := "../data"
|
||||
officialPath := filepath.Join(dataDir, officialName)
|
||||
log.Printf("📁 创建保存目录: %s", officialPath)
|
||||
if err := os.MkdirAll(officialPath, 0755); err != nil {
|
||||
log.Printf("❌ 创建保存目录失败: %v", err)
|
||||
currentTask.Running = false
|
||||
writeJSON(w, Response{Success: false, Message: "创建保存目录失败: " + err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
// 获取文章详情
|
||||
log.Printf("📊 开始获取文章详情数据...")
|
||||
err = crawler.GetDetailList(articleList, officialPath)
|
||||
if err != nil {
|
||||
log.Printf("❌ 获取文章详情失败: %v", err)
|
||||
currentTask.Running = false
|
||||
writeJSON(w, Response{Success: false, Message: "获取文章详情失败: " + err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
log.Printf("✅ 文章详情获取完成")
|
||||
|
||||
currentTask.Running = false
|
||||
currentTask.Progress = 100
|
||||
currentTask.Message = "文章详情获取完成"
|
||||
|
||||
// 统计文章详情文件数量
|
||||
detailPath := filepath.Join(officialPath, "文章详细")
|
||||
var detailFiles []string
|
||||
if entries, err := os.ReadDir(detailPath); err == nil {
|
||||
for _, entry := range entries {
|
||||
if !entry.IsDir() && strings.HasSuffix(entry.Name(), "_文章详情.txt") {
|
||||
detailFiles = append(detailFiles, entry.Name())
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if len(detailFiles) == 0 {
|
||||
// 检查主目录
|
||||
log.Printf("⚠️ 文章详细目录下未找到文件,检查主目录...")
|
||||
if entries, err := os.ReadDir(officialPath); err == nil {
|
||||
for _, entry := range entries {
|
||||
if !entry.IsDir() && strings.HasSuffix(entry.Name(), "_文章详情.txt") {
|
||||
detailFiles = append(detailFiles, entry.Name())
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
log.Printf("✅ 找到 %d 个文章详情文件", len(detailFiles))
|
||||
log.Printf(strings.Repeat("=", 60) + "\n")
|
||||
|
||||
writeJSON(w, Response{
|
||||
Success: true,
|
||||
Message: fmt.Sprintf("文章详情获取成功,共 %d 篇文章", len(detailFiles)),
|
||||
Data: map[string]interface{}{
|
||||
"account": officialName,
|
||||
"articleCount": len(detailFiles),
|
||||
"path": officialPath,
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
// min 返回两个整数中的较小值
|
||||
func min(a, b int) int {
|
||||
if a < b {
|
||||
return a
|
||||
}
|
||||
return b
|
||||
}
|
||||
|
||||
// parseAccessToken 从URL中解析access token参数
|
||||
func parseAccessToken(accessToken string) (map[string]string, error) {
|
||||
params := make(map[string]string)
|
||||
|
||||
// 如果是完整URL,解析参数
|
||||
if strings.HasPrefix(accessToken, "http://") || strings.HasPrefix(accessToken, "https://") {
|
||||
parsedURL, err := url.Parse(accessToken)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("URL格式错误: %v", err)
|
||||
}
|
||||
query := parsedURL.Query()
|
||||
params["biz"] = query.Get("__biz")
|
||||
params["uin"] = query.Get("uin")
|
||||
params["key"] = query.Get("key")
|
||||
params["pass_ticket"] = query.Get("pass_ticket")
|
||||
} else {
|
||||
// 尝试使用正则表达式提取参数
|
||||
bizRegex := regexp.MustCompile(`__biz=([^&]+)`)
|
||||
if match := bizRegex.FindStringSubmatch(accessToken); len(match) > 1 {
|
||||
params["biz"] = match[1]
|
||||
}
|
||||
|
||||
uinRegex := regexp.MustCompile(`uin=([^&]+)`)
|
||||
if match := uinRegex.FindStringSubmatch(accessToken); len(match) > 1 {
|
||||
params["uin"] = match[1]
|
||||
}
|
||||
|
||||
keyRegex := regexp.MustCompile(`key=([^&]+)`)
|
||||
if match := keyRegex.FindStringSubmatch(accessToken); len(match) > 1 {
|
||||
params["key"] = match[1]
|
||||
}
|
||||
|
||||
passTicketRegex := regexp.MustCompile(`pass_ticket=([^&]+)`)
|
||||
if match := passTicketRegex.FindStringSubmatch(accessToken); len(match) > 1 {
|
||||
params["pass_ticket"] = match[1]
|
||||
}
|
||||
}
|
||||
|
||||
// 验证必需参数
|
||||
if params["biz"] == "" {
|
||||
return nil, fmt.Errorf("缺少__biz参数")
|
||||
}
|
||||
if params["uin"] == "" {
|
||||
return nil, fmt.Errorf("缺少uin参数")
|
||||
}
|
||||
if params["key"] == "" {
|
||||
return nil, fmt.Errorf("缺少key参数")
|
||||
}
|
||||
if params["pass_ticket"] == "" {
|
||||
return nil, fmt.Errorf("缺少pass_ticket参数")
|
||||
}
|
||||
|
||||
return params, nil
|
||||
}
|
||||
|
||||
// 获取数据列表
|
||||
func getDataListHandler(w http.ResponseWriter, r *http.Request) {
|
||||
dataDir := "../data"
|
||||
@@ -541,3 +878,348 @@ func writeJSON(w http.ResponseWriter, data interface{}) {
|
||||
w.Header().Set("Content-Type", "application/json; charset=utf-8")
|
||||
json.NewEncoder(w).Encode(data)
|
||||
}
|
||||
|
||||
// 生成随机Token
|
||||
func generateToken() string {
|
||||
b := make([]byte, 32)
|
||||
rand.Read(b)
|
||||
return hex.EncodeToString(b)
|
||||
}
|
||||
|
||||
// 调用Python脚本
|
||||
func callPythonScript(scriptPath string, args ...string) (string, error) {
|
||||
// 构建Python命令
|
||||
cmdArgs := append([]string{scriptPath}, args...)
|
||||
cmd := exec.Command("python", cmdArgs...)
|
||||
|
||||
// 设置工作目录为数据库目录
|
||||
dbDir, _ := filepath.Abs(filepath.Join("..", "..", "database"))
|
||||
cmd.Dir = dbDir
|
||||
|
||||
// 执行命令
|
||||
output, err := cmd.CombinedOutput()
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("%s: %s", err, string(output))
|
||||
}
|
||||
|
||||
return string(output), nil
|
||||
}
|
||||
|
||||
// 用户注册处理
|
||||
func registerHandler(w http.ResponseWriter, r *http.Request) {
|
||||
if r.Method != "POST" {
|
||||
writeJSON(w, Response{Success: false, Message: "仅支持POST请求", Code: 405})
|
||||
return
|
||||
}
|
||||
|
||||
var req RegisterRequest
|
||||
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
||||
writeJSON(w, Response{Success: false, Message: "请求参数错误", Code: 400})
|
||||
return
|
||||
}
|
||||
|
||||
// 验证输入
|
||||
if req.Username == "" || req.Password == "" || req.Email == "" {
|
||||
writeJSON(w, Response{Success: false, Message: "用户名、密码和邮箱不能为空", Code: 400})
|
||||
return
|
||||
}
|
||||
|
||||
// 调用Python脚本创建用户
|
||||
scriptPath := "user_cli.py"
|
||||
args := []string{"create", req.Username, req.Password, req.Email}
|
||||
|
||||
output, err := callPythonScript(scriptPath, args...)
|
||||
if err != nil {
|
||||
log.Printf("注册失败: %v, 输出: %s", err, output)
|
||||
|
||||
// 判断错误类型
|
||||
if strings.Contains(output, "用户名已存在") || strings.Contains(output, "邮箱已被注册") {
|
||||
writeJSON(w, Response{Success: false, Message: "用户名或邮箱已存在", Code: 409})
|
||||
} else if strings.Contains(output, "验证错误") {
|
||||
writeJSON(w, Response{Success: false, Message: output, Code: 400})
|
||||
} else {
|
||||
writeJSON(w, Response{Success: false, Message: "注册失败", Code: 500})
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
log.Printf("用户注册成功: %s", req.Username)
|
||||
writeJSON(w, Response{
|
||||
Success: true,
|
||||
Message: "注册成功",
|
||||
Code: 200,
|
||||
Data: map[string]interface{}{
|
||||
"username": req.Username,
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
// 用户登录处理
|
||||
func loginHandler(w http.ResponseWriter, r *http.Request) {
|
||||
if r.Method != "POST" {
|
||||
writeJSON(w, Response{Success: false, Message: "仅支持POST请求", Code: 405})
|
||||
return
|
||||
}
|
||||
|
||||
var req LoginRequest
|
||||
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
||||
writeJSON(w, Response{Success: false, Message: "请求参数错误", Code: 400})
|
||||
return
|
||||
}
|
||||
|
||||
// 验证输入
|
||||
if req.Username == "" || req.Password == "" {
|
||||
writeJSON(w, Response{Success: false, Message: "用户名和密码不能为空", Code: 400})
|
||||
return
|
||||
}
|
||||
|
||||
// 调用Python脚本验证用户
|
||||
scriptPath := "user_cli.py"
|
||||
args := []string{"verify", req.Username, req.Password}
|
||||
|
||||
output, err := callPythonScript(scriptPath, args...)
|
||||
log.Printf("🔍 Python输出: %s", output)
|
||||
|
||||
if err != nil {
|
||||
log.Printf("❌ 登录失败: %v", err)
|
||||
writeJSON(w, Response{Success: false, Message: "用户名或密码错误", Code: 401})
|
||||
return
|
||||
}
|
||||
|
||||
// 生成token
|
||||
token := generateToken()
|
||||
|
||||
// 从输出中解析user_id和用户信息
|
||||
var userData map[string]interface{}
|
||||
if err := json.Unmarshal([]byte(output), &userData); err != nil {
|
||||
log.Printf("❌ 解析用户数据失败: %v, 输出: %s", err, output)
|
||||
writeJSON(w, Response{Success: false, Message: "服务器内部错误", Code: 500})
|
||||
return
|
||||
}
|
||||
|
||||
// 检查是否成功
|
||||
if success, ok := userData["success"].(bool); !ok || !success {
|
||||
log.Printf("❌ 用户验证失败: %v", userData)
|
||||
writeJSON(w, Response{Success: false, Message: "用户名或密码错误", Code: 401})
|
||||
return
|
||||
}
|
||||
|
||||
userID := 0
|
||||
if uid, ok := userData["user_id"].(float64); ok {
|
||||
userID = int(uid)
|
||||
}
|
||||
|
||||
// 存储session
|
||||
sessions[token] = &Session{
|
||||
Token: token,
|
||||
UserID: userID,
|
||||
Expiry: time.Now().Add(24 * time.Hour), // 24小时过期
|
||||
}
|
||||
|
||||
log.Printf("✅ 用户登录成功: %s, token: %s", req.Username, token)
|
||||
|
||||
// 构建user_info,不包含密码相关和success标记
|
||||
userInfo := make(map[string]interface{})
|
||||
for k, v := range userData {
|
||||
if k != "password_hash" && k != "success" {
|
||||
userInfo[k] = v
|
||||
}
|
||||
}
|
||||
|
||||
writeJSON(w, Response{
|
||||
Success: true,
|
||||
Message: "登录成功",
|
||||
Code: 200,
|
||||
Data: map[string]interface{}{
|
||||
"token": token,
|
||||
"user_id": userID,
|
||||
"user_info": userInfo,
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
// 用户登出处理
|
||||
func logoutHandler(w http.ResponseWriter, r *http.Request) {
|
||||
if r.Method != "POST" {
|
||||
writeJSON(w, Response{Success: false, Message: "仅支持POST请求", Code: 405})
|
||||
return
|
||||
}
|
||||
|
||||
// 从请求头中获取token
|
||||
token := r.Header.Get("Authorization")
|
||||
if token == "" {
|
||||
var req struct {
|
||||
Token string `json:"token"`
|
||||
}
|
||||
json.NewDecoder(r.Body).Decode(&req)
|
||||
token = req.Token
|
||||
}
|
||||
|
||||
if token == "" {
|
||||
writeJSON(w, Response{Success: false, Message: "Token不能为空", Code: 400})
|
||||
return
|
||||
}
|
||||
|
||||
// 删除session
|
||||
delete(sessions, token)
|
||||
|
||||
log.Printf("用户登出成功, token: %s", token)
|
||||
|
||||
writeJSON(w, Response{
|
||||
Success: true,
|
||||
Message: "登出成功",
|
||||
Code: 200,
|
||||
})
|
||||
}
|
||||
|
||||
// 获取用户信息处理
|
||||
func getUserInfoHandler(w http.ResponseWriter, r *http.Request) {
|
||||
if r.Method != "GET" {
|
||||
writeJSON(w, Response{Success: false, Message: "仅支持GET请求", Code: 405})
|
||||
return
|
||||
}
|
||||
|
||||
// 从请求头中获取token
|
||||
token := r.Header.Get("Authorization")
|
||||
if token == "" {
|
||||
token = r.URL.Query().Get("token")
|
||||
}
|
||||
|
||||
if token == "" {
|
||||
writeJSON(w, Response{Success: false, Message: "Token不能为空", Code: 401})
|
||||
return
|
||||
}
|
||||
|
||||
// 验证session
|
||||
session, ok := sessions[token]
|
||||
if !ok || session.Expiry.Before(time.Now()) {
|
||||
if ok {
|
||||
delete(sessions, token) // 删除过期session
|
||||
}
|
||||
writeJSON(w, Response{Success: false, Message: "Token无效或已过期", Code: 401})
|
||||
return
|
||||
}
|
||||
|
||||
// 调用Python脚本获取用户信息
|
||||
scriptPath := "user_cli.py"
|
||||
args := []string{"get", fmt.Sprintf("%d", session.UserID)}
|
||||
|
||||
output, err := callPythonScript(scriptPath, args...)
|
||||
if err != nil {
|
||||
log.Printf("获取用户信息失败: %v", err)
|
||||
writeJSON(w, Response{Success: false, Message: "获取用户信息失败", Code: 500})
|
||||
return
|
||||
}
|
||||
|
||||
// 解析用户信息
|
||||
var userData map[string]interface{}
|
||||
if err := json.Unmarshal([]byte(output), &userData); err != nil {
|
||||
log.Printf("解析用户信息失败: %v", err)
|
||||
writeJSON(w, Response{Success: false, Message: "解析用户信息失败", Code: 500})
|
||||
return
|
||||
}
|
||||
|
||||
// 删除密码哈希
|
||||
delete(userData, "password_hash")
|
||||
|
||||
writeJSON(w, Response{
|
||||
Success: true,
|
||||
Message: "获取成功",
|
||||
Code: 200,
|
||||
Data: userData,
|
||||
})
|
||||
}
|
||||
|
||||
// 更新用户信息处理
|
||||
func updateUserHandler(w http.ResponseWriter, r *http.Request) {
|
||||
if r.Method != "POST" {
|
||||
writeJSON(w, Response{Success: false, Message: "仅支持POST请求", Code: 405})
|
||||
return
|
||||
}
|
||||
|
||||
// 从请求头中获取token
|
||||
token := r.Header.Get("Authorization")
|
||||
if token == "" {
|
||||
writeJSON(w, Response{Success: false, Message: "Token不能为空", Code: 401})
|
||||
return
|
||||
}
|
||||
|
||||
// 验证session
|
||||
session, ok := sessions[token]
|
||||
if !ok || session.Expiry.Before(time.Now()) {
|
||||
if ok {
|
||||
delete(sessions, token) // 删除过期session
|
||||
}
|
||||
writeJSON(w, Response{Success: false, Message: "Token无效或已过期", Code: 401})
|
||||
return
|
||||
}
|
||||
|
||||
// 解析请求体
|
||||
var req struct {
|
||||
UserID int `json:"user_id"`
|
||||
Email string `json:"email"`
|
||||
Bio string `json:"bio"`
|
||||
}
|
||||
|
||||
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
||||
log.Printf("❌ 解析请求体失败: %v", err)
|
||||
writeJSON(w, Response{Success: false, Message: "请求参数错误", Code: 400})
|
||||
return
|
||||
}
|
||||
|
||||
log.Printf("🔍 更新用户信息: user_id=%d, email=%s", req.UserID, req.Email)
|
||||
|
||||
// 验证用户ID与session一致
|
||||
if req.UserID != session.UserID {
|
||||
log.Printf("❌ 用户ID不匹配: req=%d, session=%d", req.UserID, session.UserID)
|
||||
writeJSON(w, Response{Success: false, Message: "无权操作", Code: 403})
|
||||
return
|
||||
}
|
||||
|
||||
// 调用Python脚本更新用户信息
|
||||
scriptPath := "user_cli.py"
|
||||
args := []string{"update", fmt.Sprintf("%d", req.UserID)}
|
||||
|
||||
// 添加需要更新的字段
|
||||
if req.Email != "" {
|
||||
args = append(args, "--email", req.Email)
|
||||
}
|
||||
if req.Bio != "" {
|
||||
args = append(args, "--bio", req.Bio)
|
||||
}
|
||||
|
||||
output, err := callPythonScript(scriptPath, args...)
|
||||
log.Printf("🔍 Python输出: %s", output)
|
||||
|
||||
if err != nil {
|
||||
log.Printf("❌ 更新用户信息失败: %v", err)
|
||||
writeJSON(w, Response{Success: false, Message: "更新失败", Code: 500})
|
||||
return
|
||||
}
|
||||
|
||||
// 解析响应
|
||||
var result map[string]interface{}
|
||||
if err := json.Unmarshal([]byte(output), &result); err != nil {
|
||||
log.Printf("❌ 解析响应失败: %v", err)
|
||||
writeJSON(w, Response{Success: false, Message: "服务器内部错误", Code: 500})
|
||||
return
|
||||
}
|
||||
|
||||
// 检查是否成功
|
||||
if success, ok := result["success"].(bool); !ok || !success {
|
||||
errMsg := "更新失败"
|
||||
if msg, ok := result["error"].(string); ok {
|
||||
errMsg = msg
|
||||
}
|
||||
writeJSON(w, Response{Success: false, Message: errMsg, Code: 500})
|
||||
return
|
||||
}
|
||||
|
||||
log.Printf("✅ 用户信息更新成功: user_id=%d", req.UserID)
|
||||
|
||||
writeJSON(w, Response{
|
||||
Success: true,
|
||||
Message: "更新成功",
|
||||
Code: 200,
|
||||
})
|
||||
}
|
||||
|
||||
@@ -2,8 +2,8 @@
|
||||
chcp 65001 >nul
|
||||
title 微信公众号文章爬虫 - API服务器
|
||||
|
||||
:: 检查api_server.exe是否存在
|
||||
if not exist "api_server.exe" (
|
||||
:: 检查api-server.exe是否存在
|
||||
if not exist "api-server.exe" (
|
||||
echo ===============================================
|
||||
echo ⚠️ API服务器未编译
|
||||
echo ===============================================
|
||||
@@ -20,4 +20,4 @@ if not exist "api_server.exe" (
|
||||
|
||||
:: 启动API服务器
|
||||
cls
|
||||
api_server.exe
|
||||
api-server.exe
|
||||
|
||||
@@ -205,7 +205,6 @@ func startCrawling(cfg *configs.Config) {
|
||||
ReadCount: stats["read_num"],
|
||||
LikeCount: stats["old_like_num"],
|
||||
ShareCount: stats["share_num"],
|
||||
ShowRead: stats["show_read"],
|
||||
Comments: comments,
|
||||
CommentLikes: commentLikes,
|
||||
CommentID: commentID,
|
||||
|
||||
BIN
backend/cmd/wechat-crawler.exe
Normal file
BIN
backend/cmd/wechat-crawler.exe
Normal file
Binary file not shown.
22
backend/go.mod
Normal file
22
backend/go.mod
Normal file
@@ -0,0 +1,22 @@
|
||||
module github.com/wechat-crawler
|
||||
|
||||
go 1.24.0
|
||||
|
||||
require (
|
||||
github.com/go-resty/resty/v2 v2.17.0
|
||||
modernc.org/sqlite v1.40.1
|
||||
)
|
||||
|
||||
require (
|
||||
github.com/dustin/go-humanize v1.0.1 // indirect
|
||||
github.com/google/uuid v1.6.0 // indirect
|
||||
github.com/mattn/go-isatty v0.0.20 // indirect
|
||||
github.com/ncruces/go-strftime v0.1.9 // indirect
|
||||
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
|
||||
golang.org/x/exp v0.0.0-20250620022241-b7579e27df2b // indirect
|
||||
golang.org/x/net v0.43.0 // indirect
|
||||
golang.org/x/sys v0.36.0 // indirect
|
||||
modernc.org/libc v1.66.10 // indirect
|
||||
modernc.org/mathutil v1.7.1 // indirect
|
||||
modernc.org/memory v1.11.0 // indirect
|
||||
)
|
||||
57
backend/go.sum
Normal file
57
backend/go.sum
Normal file
@@ -0,0 +1,57 @@
|
||||
github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY=
|
||||
github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto=
|
||||
github.com/go-resty/resty/v2 v2.17.0 h1:pW9DeXcaL4Rrym4EZ8v7L19zZiIlWPg5YXAcVmt+gN0=
|
||||
github.com/go-resty/resty/v2 v2.17.0/go.mod h1:kCKZ3wWmwJaNc7S29BRtUhJwy7iqmn+2mLtQrOyQlVA=
|
||||
github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e h1:ijClszYn+mADRFY17kjQEVQ1XRhq2/JR1M3sGqeJoxs=
|
||||
github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e/go.mod h1:boTsfXsheKC2y+lKOCMpSfarhxDeIzfZG1jqGcPl3cA=
|
||||
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
|
||||
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
|
||||
github.com/gorilla/mux v1.8.1 h1:TuBL49tXwgrFYWhqrNgrUNEY92u81SPhu7sTdzQEiWY=
|
||||
github.com/gorilla/mux v1.8.1/go.mod h1:AKf9I4AEqPTmMytcMc0KkNouC66V3BtZ4qD5fmWSiMQ=
|
||||
github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY=
|
||||
github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
|
||||
github.com/ncruces/go-strftime v0.1.9 h1:bY0MQC28UADQmHmaF5dgpLmImcShSi2kHU9XLdhx/f4=
|
||||
github.com/ncruces/go-strftime v0.1.9/go.mod h1:Fwc5htZGVVkseilnfgOVb9mKy6w1naJmn9CehxcKcls=
|
||||
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec h1:W09IVJc94icq4NjY3clb7Lk8O1qJ8BdBEF8z0ibU0rE=
|
||||
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec/go.mod h1:qqbHyh8v60DhA7CoWK5oRCqLrMHRGoxYCSS9EjAz6Eo=
|
||||
golang.org/x/exp v0.0.0-20250620022241-b7579e27df2b h1:M2rDM6z3Fhozi9O7NWsxAkg/yqS/lQJ6PmkyIV3YP+o=
|
||||
golang.org/x/exp v0.0.0-20250620022241-b7579e27df2b/go.mod h1:3//PLf8L/X+8b4vuAfHzxeRUl04Adcb341+IGKfnqS8=
|
||||
golang.org/x/mod v0.27.0 h1:kb+q2PyFnEADO2IEF935ehFUXlWiNjJWtRNgBLSfbxQ=
|
||||
golang.org/x/mod v0.27.0/go.mod h1:rWI627Fq0DEoudcK+MBkNkCe0EetEaDSwJJkCcjpazc=
|
||||
golang.org/x/net v0.43.0 h1:lat02VYK2j4aLzMzecihNvTlJNQUq316m2Mr9rnM6YE=
|
||||
golang.org/x/net v0.43.0/go.mod h1:vhO1fvI4dGsIjh73sWfUVjj3N7CA9WkKJNQm2svM6Jg=
|
||||
golang.org/x/sync v0.16.0 h1:ycBJEhp9p4vXvUZNszeOq0kGTPghopOL8q0fq3vstxw=
|
||||
golang.org/x/sync v0.16.0/go.mod h1:1dzgHSNfp02xaA81J2MS99Qcpr2w7fw1gpm99rleRqA=
|
||||
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.36.0 h1:KVRy2GtZBrk1cBYA7MKu5bEZFxQk4NIDV6RLVcC8o0k=
|
||||
golang.org/x/sys v0.36.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
|
||||
golang.org/x/time v0.12.0 h1:ScB/8o8olJvc+CQPWrK3fPZNfh7qgwCrY0zJmoEQLSE=
|
||||
golang.org/x/time v0.12.0/go.mod h1:CDIdPxbZBQxdj6cxyCIdrNogrJKMJ7pr37NYpMcMDSg=
|
||||
golang.org/x/tools v0.36.0 h1:kWS0uv/zsvHEle1LbV5LE8QujrxB3wfQyxHfhOk0Qkg=
|
||||
golang.org/x/tools v0.36.0/go.mod h1:WBDiHKJK8YgLHlcQPYQzNCkUxUypCaa5ZegCVutKm+s=
|
||||
modernc.org/cc/v4 v4.26.5 h1:xM3bX7Mve6G8K8b+T11ReenJOT+BmVqQj0FY5T4+5Y4=
|
||||
modernc.org/cc/v4 v4.26.5/go.mod h1:uVtb5OGqUKpoLWhqwNQo/8LwvoiEBLvZXIQ/SmO6mL0=
|
||||
modernc.org/ccgo/v4 v4.28.1 h1:wPKYn5EC/mYTqBO373jKjvX2n+3+aK7+sICCv4Fjy1A=
|
||||
modernc.org/ccgo/v4 v4.28.1/go.mod h1:uD+4RnfrVgE6ec9NGguUNdhqzNIeeomeXf6CL0GTE5Q=
|
||||
modernc.org/fileutil v1.3.40 h1:ZGMswMNc9JOCrcrakF1HrvmergNLAmxOPjizirpfqBA=
|
||||
modernc.org/fileutil v1.3.40/go.mod h1:HxmghZSZVAz/LXcMNwZPA/DRrQZEVP9VX0V4LQGQFOc=
|
||||
modernc.org/gc/v2 v2.6.5 h1:nyqdV8q46KvTpZlsw66kWqwXRHdjIlJOhG6kxiV/9xI=
|
||||
modernc.org/gc/v2 v2.6.5/go.mod h1:YgIahr1ypgfe7chRuJi2gD7DBQiKSLMPgBQe9oIiito=
|
||||
modernc.org/goabi0 v0.2.0 h1:HvEowk7LxcPd0eq6mVOAEMai46V+i7Jrj13t4AzuNks=
|
||||
modernc.org/goabi0 v0.2.0/go.mod h1:CEFRnnJhKvWT1c1JTI3Avm+tgOWbkOu5oPA8eH8LnMI=
|
||||
modernc.org/libc v1.66.10 h1:yZkb3YeLx4oynyR+iUsXsybsX4Ubx7MQlSYEw4yj59A=
|
||||
modernc.org/libc v1.66.10/go.mod h1:8vGSEwvoUoltr4dlywvHqjtAqHBaw0j1jI7iFBTAr2I=
|
||||
modernc.org/mathutil v1.7.1 h1:GCZVGXdaN8gTqB1Mf/usp1Y/hSqgI2vAGGP4jZMCxOU=
|
||||
modernc.org/mathutil v1.7.1/go.mod h1:4p5IwJITfppl0G4sUEDtCr4DthTaT47/N3aT6MhfgJg=
|
||||
modernc.org/memory v1.11.0 h1:o4QC8aMQzmcwCK3t3Ux/ZHmwFPzE6hf2Y5LbkRs+hbI=
|
||||
modernc.org/memory v1.11.0/go.mod h1:/JP4VbVC+K5sU2wZi9bHoq2MAkCnrt2r98UGeSK7Mjw=
|
||||
modernc.org/opt v0.1.4 h1:2kNGMRiUjrp4LcaPuLY2PzUfqM/w9N23quVwhKt5Qm8=
|
||||
modernc.org/opt v0.1.4/go.mod h1:03fq9lsNfvkYSfxrfUhZCWPk1lm4cq4N+Bh//bEtgns=
|
||||
modernc.org/sortutil v1.2.1 h1:+xyoGf15mM3NMlPDnFqrteY07klSFxLElE2PVuWIJ7w=
|
||||
modernc.org/sortutil v1.2.1/go.mod h1:7ZI3a3REbai7gzCLcotuw9AC4VZVpYMjDzETGsSMqJE=
|
||||
modernc.org/sqlite v1.40.1 h1:VfuXcxcUWWKRBuP8+BR9L7VnmusMgBNNnBYGEe9w/iY=
|
||||
modernc.org/sqlite v1.40.1/go.mod h1:9fjQZ0mB1LLP0GYrp39oOJXx/I2sxEnZtzCmEQIKvGE=
|
||||
modernc.org/strutil v1.2.1 h1:UneZBkQA+DX2Rp35KcM69cSsNES9ly8mQWD71HKlOA0=
|
||||
modernc.org/strutil v1.2.1/go.mod h1:EHkiggD70koQxjVdSBM3JKM7k6L0FbGE5eymy9i3B9A=
|
||||
modernc.org/token v1.1.0 h1:Xl7Ap9dKaEs5kLoOQeQmPWevfnk/DM5qcLcYlA8ys6Y=
|
||||
modernc.org/token v1.1.0/go.mod h1:UGzOrNV1mAFSEB63lOFHIpNRUVMvYTc6yu1SMY/XTDM=
|
||||
BIN
backend/output/wechat-crawler.exe
Normal file
BIN
backend/output/wechat-crawler.exe
Normal file
Binary file not shown.
@@ -25,7 +25,6 @@ type ArticleDetail struct {
|
||||
ReadCount string `json:"read_count"`
|
||||
LikeCount string `json:"like_count"`
|
||||
ShareCount string `json:"share_count"`
|
||||
ShowRead string `json:"show_read"`
|
||||
Comments []string `json:"comments"`
|
||||
CommentLikes []string `json:"comment_likes"`
|
||||
CommentID string `json:"comment_id"`
|
||||
@@ -1624,7 +1623,6 @@ func (w *WechatCrawler) GetArticleDetail(link string) (*ArticleDetail, error) {
|
||||
ReadCount: stats["read_num"],
|
||||
LikeCount: stats["old_like_num"],
|
||||
ShareCount: stats["share_num"],
|
||||
ShowRead: stats["show_read"],
|
||||
Comments: comments,
|
||||
CommentLikes: commentLikes,
|
||||
CommentID: commentID,
|
||||
@@ -1731,7 +1729,6 @@ func (c *WechatCrawler) SaveArticleDetailToExcel(article *ArticleDetail, filePat
|
||||
content.WriteString(fmt.Sprintf("阅读量: %s\n", article.ReadCount))
|
||||
content.WriteString(fmt.Sprintf("点赞数: %s\n", article.LikeCount))
|
||||
content.WriteString(fmt.Sprintf("转发数: %s\n", article.ShareCount))
|
||||
content.WriteString(fmt.Sprintf("在看数: %s\n", article.ShowRead))
|
||||
content.WriteString(strings.Repeat("=", 80))
|
||||
content.WriteString("\n\n")
|
||||
|
||||
|
||||
BIN
backend/wechat-crawler.exe
Normal file
BIN
backend/wechat-crawler.exe
Normal file
Binary file not shown.
Reference in New Issue
Block a user