2025-12-2genxin

This commit is contained in:
2025-12-02 14:58:52 +08:00
parent 4fef65bd93
commit be0954828c
36 changed files with 3352 additions and 1638 deletions

View File

@@ -1,460 +0,0 @@
# 📡 微信公众号文章爬虫 - API 接口文档
## 服务器信息
- **服务地址**: http://localhost:8080
- **协议**: HTTP/1.1
- **数据格式**: JSON
- **字符编码**: UTF-8
- **CORS**: 已启用(允许所有来源)
## 统一响应格式
所有API接口返回格式统一为
```json
{
"success": true, // 请求是否成功
"message": "操作成功", // 提示信息
"data": {} // 数据内容(可选)
}
```
## 接口列表
### 1. 提取公众号主页
**接口地址**: `/api/homepage/extract`
**请求方法**: POST
**功能说明**: 从文章链接中提取公众号主页链接
#### 请求参数
```json
{
"url": "https://mp.weixin.qq.com/s?__biz=xxx&mid=xxx"
}
```
| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| url | string | 是 | 公众号文章链接 |
#### 响应示例
**成功响应**:
```json
{
"success": true,
"message": "提取成功",
"data": {
"homepage": "https://mp.weixin.qq.com/mp/profile_ext?action=home&__biz=xxx&scene=124",
"output": "完整的命令行输出信息"
}
}
```
**失败响应**:
```json
{
"success": false,
"message": "未能提取到主页链接"
}
```
#### 调用示例
**jQuery**:
```javascript
$.ajax({
url: 'http://localhost:8080/api/homepage/extract',
method: 'POST',
contentType: 'application/json',
data: JSON.stringify({
url: 'https://mp.weixin.qq.com/s?__biz=xxx&mid=xxx'
}),
success: function(response) {
if (response.success) {
console.log('主页链接:', response.data.homepage);
}
}
});
```
**curl**:
```bash
curl -X POST http://localhost:8080/api/homepage/extract \
-H "Content-Type: application/json" \
-d '{"url":"https://mp.weixin.qq.com/s?__biz=xxx&mid=xxx"}'
```
---
### 2. 下载单篇文章
**接口地址**: `/api/article/download`
**请求方法**: POST
**功能说明**: 下载指定的单篇文章
#### 请求参数
```json
{
"url": "https://mp.weixin.qq.com/s?__biz=xxx",
"save_image": true,
"save_content": true
}
```
| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| url | string | 是 | 文章链接 |
| save_image | boolean | 否 | 是否保存图片默认false |
| save_content | boolean | 否 | 是否保存内容默认true |
#### 响应示例
```json
{
"success": true,
"message": "下载任务已启动",
"data": {
"url": "https://mp.weixin.qq.com/s?__biz=xxx"
}
}
```
---
### 3. 获取文章列表
**接口地址**: `/api/article/list`
**请求方法**: POST
**功能说明**: 批量获取公众号的文章列表
#### 请求参数
```json
{
"access_token": "https://mp.weixin.qq.com/mp/profile_ext?action=xxx&appmsg_token=xxx",
"pages": 0
}
```
| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| access_token | string | 是 | 包含appmsg_token的URL |
| pages | integer | 否 | 获取页数0表示全部默认0 |
#### 响应示例
```json
{
"success": true,
"message": "任务已启动"
}
```
---
### 4. 批量下载文章
**接口地址**: `/api/article/batch`
**请求方法**: POST
**功能说明**: 批量下载公众号的所有文章
#### 请求参数
```json
{
"official_account": "公众号名称或文章链接",
"save_image": true,
"save_content": true
}
```
| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| official_account | string | 是 | 公众号名称或任意文章链接 |
| save_image | boolean | 否 | 是否保存图片默认false |
| save_content | boolean | 否 | 是否保存内容默认true |
#### 响应示例
```json
{
"success": true,
"message": "任务已启动"
}
```
---
### 5. 获取数据列表
**接口地址**: `/api/data/list`
**请求方法**: GET
**功能说明**: 获取已下载的公众号数据列表
#### 请求参数
#### 响应示例
```json
{
"success": true,
"data": [
{
"name": "研招网资讯",
"article_count": 125,
"path": "D:\\workspace\\Access_wechat_article\\backend\\data\\研招网资讯",
"last_update": "2025-11-27"
}
]
}
```
| 字段 | 类型 | 说明 |
|------|------|------|
| name | string | 公众号名称 |
| article_count | integer | 文章数量 |
| path | string | 存储路径 |
| last_update | string | 最后更新时间 |
#### 调用示例
**jQuery**:
```javascript
$.get('http://localhost:8080/api/data/list', function(response) {
if (response.success) {
console.log('数据列表:', response.data);
}
});
```
**curl**:
```bash
curl http://localhost:8080/api/data/list
```
---
### 6. 获取任务状态
**接口地址**: `/api/task/status`
**请求方法**: GET
**功能说明**: 获取当前任务的执行状态
#### 请求参数
#### 响应示例
**任务运行中**:
```json
{
"success": true,
"data": {
"running": true,
"progress": 45,
"message": "正在下载第10篇文章..."
}
}
```
**无任务运行**:
```json
{
"success": true,
"data": {
"running": false,
"progress": 0,
"message": ""
}
}
```
| 字段 | 类型 | 说明 |
|------|------|------|
| running | boolean | 是否有任务运行中 |
| progress | integer | 任务进度0-100 |
| message | string | 任务状态描述 |
| error | string | 错误信息(可选) |
---
## 错误码说明
### HTTP状态码
| 状态码 | 说明 |
|--------|------|
| 200 | 请求成功 |
| 400 | 请求参数错误 |
| 500 | 服务器内部错误 |
### 业务错误码
所有业务错误通过响应中的 `success` 字段和 `message` 字段返回:
```json
{
"success": false,
"message": "具体的错误信息"
}
```
常见错误信息:
| 错误信息 | 说明 | 解决方法 |
|----------|------|----------|
| 请求参数错误 | JSON格式不正确或缺少必填参数 | 检查请求参数格式 |
| 执行失败 | 后端程序执行出错 | 查看详细错误信息 |
| 未能提取到主页链接 | 文章链接格式错误或解析失败 | 使用有效的文章链接 |
| 读取数据目录失败 | data目录不存在或无权限 | 检查目录权限 |
---
## 开发指南
### 本地测试
1. **启动API服务器**:
```bash
cd backend\api
start_api.bat
```
2. **测试接口**:
```bash
# 测试提取主页
curl -X POST http://localhost:8080/api/homepage/extract \
-H "Content-Type: application/json" \
-d "{\"url\":\"文章链接\"}"
# 测试获取数据列表
curl http://localhost:8080/api/data/list
```
### 跨域配置
API服务器已启用CORS允许所有来源访问
```go
w.Header().Set("Access-Control-Allow-Origin", "*")
w.Header().Set("Access-Control-Allow-Methods", "GET, POST, OPTIONS")
w.Header().Set("Access-Control-Allow-Headers", "Content-Type")
```
如需限制特定域名,修改 `server.go` 中的 `corsMiddleware` 函数。
### 超时设置
默认HTTP超时时间30秒
如需修改,在 `server.go` 中添加:
```go
server := &http.Server{
Addr: ":8080",
ReadTimeout: 30 * time.Second,
WriteTimeout: 30 * time.Second,
}
```
### 日志记录
API服务器使用标准输出记录日志
```go
log.Printf("[%s] %s - %s", r.Method, r.URL.Path, message)
```
---
## 接口更新计划
### v1.1.0(计划中)
- [ ] 添加用户认证机制
- [ ] 支持任务队列管理
- [ ] 增加下载进度推送WebSocket
- [ ] 提供文章搜索接口
### v1.2.0(计划中)
- [ ] 数据统计分析接口
- [ ] 导出功能PDF/Word
- [ ] 批量任务管理
- [ ] 定时任务支持
---
## 技术栈
- **语言**: Go 1.20+
- **Web框架**: net/http (标准库)
- **数据格式**: JSON
- **并发模型**: Goroutine
---
## 性能说明
### 并发能力
- 支持多客户端同时访问
- 但同一时间只能执行一个爬虫任务(`currentTask`
### 资源占用
- CPU: 低主要I/O操作
- 内存: <50MB
- 磁盘: 取决于下载的文章数量
### 性能优化建议
1. 使用连接池管理HTTP请求
2. 实现任务队列机制
3. 添加结果缓存
4. 启用gzip压缩
---
## 安全建议
### 1. 生产环境部署
- 添加HTTPS支持
- 实现API认证JWT/OAuth
- 限制跨域来源
- 添加请求频率限制
### 2. 数据安全
- 不要暴露敏感信息Cookie
- 定期清理临时文件
- 备份重要数据
### 3. 访问控制
- 添加IP白名单
- 实现用户权限管理
- 记录操作日志
---
## 常见问题
### Q1: 为什么任务启动后没有响应?
A: 检查后端 `wechat-crawler.exe` 是否存在并有执行权限
### Q2: 如何查看详细的错误信息?
A: 查看API服务器窗口的控制台输出
### Q3: 能同时执行多个下载任务吗?
A: 当前版本不支持同时只能执行一个任务
### Q4: 如何停止正在运行的任务?
A: 关闭API服务器窗口或重启服务器
---
**文档版本**: v1.0.0
**最后更新**: 2025-11-27
**维护者**: AI Assistant

BIN
backend/api/api-server.exe Normal file

Binary file not shown.

BIN
backend/api/api-server.exe~ Normal file

Binary file not shown.

View File

@@ -5,8 +5,8 @@ echo 📦 编译 API 服务器
echo ===============================================
echo.
echo 🔨 正在编译 api_server.exe...
go build -o api_server.exe server.go
echo 🔨 正在编译 api-server.exe...
go build -o api-server.exe server.go
if %errorlevel% neq 0 (
echo.
@@ -18,7 +18,7 @@ if %errorlevel% neq 0 (
echo.
echo ✅ 编译成功!
echo 📁 输出文件: api_server.exe
echo 📁 输出文件: api-server.exe
echo.
echo ===============================================
echo 编译完成

File diff suppressed because one or more lines are too long

View File

@@ -1,15 +1,21 @@
package main
import (
"crypto/rand"
"encoding/hex"
"encoding/json"
"fmt"
"log"
"net/http"
"net/url"
"os"
"os/exec"
"path/filepath"
"regexp"
"strings"
"time"
"github.com/wechat-crawler/pkg/wechat"
)
// Response 统一响应结构
@@ -17,6 +23,7 @@ type Response struct {
Success bool `json:"success"`
Message string `json:"message"`
Data interface{} `json:"data,omitempty"`
Code int `json:"code,omitempty"`
}
// 任务状态
@@ -27,7 +34,28 @@ type TaskStatus struct {
Error string `json:"error,omitempty"`
}
// 用户登录请求
type LoginRequest struct {
Username string `json:"username"`
Password string `json:"password"`
}
// 用户注册请求
type RegisterRequest struct {
Username string `json:"username"`
Password string `json:"password"`
Email string `json:"email"`
}
// Session存储
type Session struct {
Token string
UserID int
Expiry time.Time
}
var currentTask = &TaskStatus{Running: false}
var sessions = make(map[string]*Session)
func main() {
// 启用CORS
@@ -36,10 +64,18 @@ func main() {
http.HandleFunc("/api/article/download", corsMiddleware(downloadArticleHandler))
http.HandleFunc("/api/article/list", corsMiddleware(getArticleListHandler))
http.HandleFunc("/api/article/batch", corsMiddleware(batchDownloadHandler))
http.HandleFunc("/api/article/detail", corsMiddleware(getArticleDetailHandler))
http.HandleFunc("/api/data/list", corsMiddleware(getDataListHandler))
http.HandleFunc("/api/task/status", corsMiddleware(getTaskStatusHandler))
http.HandleFunc("/api/download/", corsMiddleware(downloadFileHandler))
// 用户认证接口
http.HandleFunc("/api/user/register", corsMiddleware(registerHandler))
http.HandleFunc("/api/user/login", corsMiddleware(loginHandler))
http.HandleFunc("/api/user/logout", corsMiddleware(logoutHandler))
http.HandleFunc("/api/user/info", corsMiddleware(getUserInfoHandler))
http.HandleFunc("/api/user/update", corsMiddleware(updateUserHandler))
port := ":8080"
fmt.Println("===============================================")
fmt.Println(" 🚀 微信公众号文章爬虫 API 服务器")
@@ -58,7 +94,7 @@ func corsMiddleware(next http.HandlerFunc) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Access-Control-Allow-Origin", "*")
w.Header().Set("Access-Control-Allow-Methods", "GET, POST, OPTIONS")
w.Header().Set("Access-Control-Allow-Headers", "Content-Type")
w.Header().Set("Access-Control-Allow-Headers", "Content-Type, Authorization")
if r.Method == "OPTIONS" {
w.WriteHeader(http.StatusOK)
@@ -98,6 +134,9 @@ func handleRoot(w http.ResponseWriter, r *http.Request) {
<div class="endpoint">
<span class="method">POST</span> /api/article/list - 获取文章列表
</div>
<div class="endpoint">
<span class="method">POST</span> /api/article/detail - 获取文章详情(阅读量、点赞数、评论等)
</div>
<div class="endpoint">
<span class="method">POST</span> /api/article/batch - 批量下载文章
</div>
@@ -216,12 +255,12 @@ func getArticleListHandler(w http.ResponseWriter, r *http.Request) {
currentTask.Progress = 0
currentTask.Message = "正在获取文章列表..."
// 同步执行爬虫程序(功能3
// 同步执行爬虫程序(功能2获取文章列表
exePath := filepath.Join("..", "wechat-crawler.exe")
absPath, _ := filepath.Abs(exePath)
workDir, _ := filepath.Abs("..")
log.Printf("启动功能3: %s, 工作目录: %s", absPath, workDir)
log.Printf("启动功能2: %s, 工作目录: %s", absPath, workDir)
cmd := exec.Command(absPath)
cmd.Dir = workDir
@@ -242,8 +281,8 @@ func getArticleListHandler(w http.ResponseWriter, r *http.Request) {
return
}
// 发送选项"3"(功能3通过access_token获取文章列表
fmt.Fprintln(stdin, "3")
// 发送选项"2"(功能2通过access_token获取文章列表
fmt.Fprintln(stdin, "2")
fmt.Fprintln(stdin, req.AccessToken)
if req.Pages > 0 {
fmt.Fprintf(stdin, "%d\n", req.Pages)
@@ -445,6 +484,304 @@ func batchDownloadHandler(w http.ResponseWriter, r *http.Request) {
})
}
// 获取文章详情功能4包括阅读量、点赞数、评论等
func getArticleDetailHandler(w http.ResponseWriter, r *http.Request) {
var req struct {
AccessToken string `json:"access_token"`
Pages int `json:"pages"`
}
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
log.Printf("❌ 解析请求失败: %v", err)
writeJSON(w, Response{Success: false, Message: "请求参数错误: " + err.Error()})
return
}
if req.AccessToken == "" {
log.Printf("❌ Access Token 为空")
writeJSON(w, Response{Success: false, Message: "请输入Access Token URL"})
return
}
log.Printf("\n" + strings.Repeat("=", 60))
log.Printf("📊 开始获取文章详情功能")
log.Printf("接收到的 Access Token: %s", req.AccessToken[:min(100, len(req.AccessToken))])
log.Printf("获取页数: %d (0表示全部)", req.Pages)
currentTask.Running = true
currentTask.Progress = 0
currentTask.Message = "正在解析Access Token参数..."
// 从Access Token URL中提取参数
params, err := parseAccessToken(req.AccessToken)
if err != nil {
log.Printf("❌ 解析Access Token失败: %v", err)
currentTask.Running = false
writeJSON(w, Response{Success: false, Message: "Access Token 参数格式错误: " + err.Error()})
return
}
log.Printf("✅ 参数解析成功:")
log.Printf(" - biz: %s", params["biz"][:min(20, len(params["biz"]))])
log.Printf(" - uin: %s", params["uin"])
log.Printf(" - key: %s", params["key"][:min(20, len(params["key"]))])
log.Printf(" - pass_ticket: %s", params["pass_ticket"][:min(20, len(params["pass_ticket"]))])
// 创建爬虫实例
log.Printf("🔧 创建爬虫实例...")
crawler, err := wechat.NewWechatCrawler(
params["biz"],
params["uin"],
params["key"],
params["pass_ticket"],
nil,
)
if err != nil {
log.Printf("❌ 创建爬虫实例失败: %v", err)
currentTask.Running = false
writeJSON(w, Response{Success: false, Message: "创建爬虫实例失败: " + err.Error()})
return
}
log.Printf("✅ 爬虫实例创建成功")
currentTask.Progress = 20
currentTask.Message = "正在获取公众号名称..."
// 获取公众号名称
log.Printf("📱 获取公众号名称...")
officialName, err := crawler.GetOfficialAccountName()
if err != nil {
log.Printf("❌ 获取公众号名称失败: %v", err)
currentTask.Running = false
writeJSON(w, Response{Success: false, Message: "获取公众号名称失败: " + err.Error()})
return
}
log.Printf("✅ 公众号名称: %s", officialName)
currentTask.Progress = 40
currentTask.Message = "正在获取文章列表..."
// 获取文章列表
log.Printf("📋 获取文章列表...")
var articleList [][]string
if req.Pages > 0 {
// 只获取指定页数
log.Printf("📄 限制获取前 %d 页", req.Pages)
for offset := 0; offset < req.Pages; offset++ {
result, e := crawler.GetNextList(offset)
if e != nil {
log.Printf("❌ 获取第 %d 页失败: %v", offset+1, e)
err = e
break
}
// 检查是否有数据
mFlag, ok := result["m_flag"].(int)
if !ok {
if mFlagFloat, ok := result["m_flag"].(float64); ok {
mFlag = int(mFlagFloat)
}
}
if mFlag == 0 {
log.Printf(" 第 %d 页无更多数据", offset+1)
break
}
// 获取当前页的文章列表
log.Printf("📝 尝试从 result 中提取 passage_list...")
// 先尝试 [][]string 类型GetNextList 实际返回的类型)
if passageListStr, ok := result["passage_list"].([][]string); ok {
log.Printf("✅ passage_list 提取成功([][]string包含 %d 个元素", len(passageListStr))
for idx, strArr := range passageListStr {
articleList = append(articleList, strArr)
log.Printf("✅ 添加第 %d 篇文章: %v", idx+1, strArr)
}
} else if passageList, ok := result["passage_list"].([]interface{}); ok {
// 备用:尝试 []interface{} 类型
log.Printf("✅ passage_list 提取成功([]interface{}),包含 %d 个元素", len(passageList))
for idx, item := range passageList {
if arr, ok := item.([]interface{}); ok {
strArr := make([]string, len(arr))
for i, v := range arr {
if s, ok := v.(string); ok {
strArr[i] = s
}
}
articleList = append(articleList, strArr)
log.Printf("✅ 添加第 %d 篇文章: %v", idx+1, strArr)
} else {
log.Printf("❌ 第 %d 个 item 不是 []interface{} 类型,实际类型: %T", idx+1, item)
}
}
} else {
log.Printf("❌ passage_list 类型断言失败,实际类型: %T", result["passage_list"])
}
log.Printf("✅ 已获取第 %d/%d 页,当前累计 %d 篇文章", offset+1, req.Pages, len(articleList))
// 添加延迟
if offset < req.Pages-1 {
time.Sleep(2 * time.Second)
}
}
// 转换链接
log.Printf("🔗 转换文章链接...转换前共 %d 篇", len(articleList))
articleList = crawler.TransformLinks(articleList)
log.Printf("✅ 链接转换完成,共 %d 篇文章", len(articleList))
} else {
// 获取全部文章
log.Printf("📄 获取全部文章")
articleList, err = crawler.GetArticleList()
}
if err != nil {
log.Printf("❌ 获取文章列表失败: %v", err)
currentTask.Running = false
writeJSON(w, Response{Success: false, Message: "获取文章列表失败: " + err.Error()})
return
}
if len(articleList) == 0 {
log.Printf("⚠️ 文章列表为空")
currentTask.Running = false
writeJSON(w, Response{Success: false, Message: "公众号文章列表为空,可能是 Access Token 无效或公众号无文章"})
return
}
log.Printf("✅ 获取到 %d 篇文章", len(articleList))
currentTask.Progress = 60
currentTask.Message = fmt.Sprintf("正在获取文章详情 (0/%d)...", len(articleList))
// 创建保存目录
dataDir := "../data"
officialPath := filepath.Join(dataDir, officialName)
log.Printf("📁 创建保存目录: %s", officialPath)
if err := os.MkdirAll(officialPath, 0755); err != nil {
log.Printf("❌ 创建保存目录失败: %v", err)
currentTask.Running = false
writeJSON(w, Response{Success: false, Message: "创建保存目录失败: " + err.Error()})
return
}
// 获取文章详情
log.Printf("📊 开始获取文章详情数据...")
err = crawler.GetDetailList(articleList, officialPath)
if err != nil {
log.Printf("❌ 获取文章详情失败: %v", err)
currentTask.Running = false
writeJSON(w, Response{Success: false, Message: "获取文章详情失败: " + err.Error()})
return
}
log.Printf("✅ 文章详情获取完成")
currentTask.Running = false
currentTask.Progress = 100
currentTask.Message = "文章详情获取完成"
// 统计文章详情文件数量
detailPath := filepath.Join(officialPath, "文章详细")
var detailFiles []string
if entries, err := os.ReadDir(detailPath); err == nil {
for _, entry := range entries {
if !entry.IsDir() && strings.HasSuffix(entry.Name(), "_文章详情.txt") {
detailFiles = append(detailFiles, entry.Name())
}
}
}
if len(detailFiles) == 0 {
// 检查主目录
log.Printf("⚠️ 文章详细目录下未找到文件,检查主目录...")
if entries, err := os.ReadDir(officialPath); err == nil {
for _, entry := range entries {
if !entry.IsDir() && strings.HasSuffix(entry.Name(), "_文章详情.txt") {
detailFiles = append(detailFiles, entry.Name())
}
}
}
}
log.Printf("✅ 找到 %d 个文章详情文件", len(detailFiles))
log.Printf(strings.Repeat("=", 60) + "\n")
writeJSON(w, Response{
Success: true,
Message: fmt.Sprintf("文章详情获取成功,共 %d 篇文章", len(detailFiles)),
Data: map[string]interface{}{
"account": officialName,
"articleCount": len(detailFiles),
"path": officialPath,
},
})
}
// min 返回两个整数中的较小值
func min(a, b int) int {
if a < b {
return a
}
return b
}
// parseAccessToken 从URL中解析access token参数
func parseAccessToken(accessToken string) (map[string]string, error) {
params := make(map[string]string)
// 如果是完整URL解析参数
if strings.HasPrefix(accessToken, "http://") || strings.HasPrefix(accessToken, "https://") {
parsedURL, err := url.Parse(accessToken)
if err != nil {
return nil, fmt.Errorf("URL格式错误: %v", err)
}
query := parsedURL.Query()
params["biz"] = query.Get("__biz")
params["uin"] = query.Get("uin")
params["key"] = query.Get("key")
params["pass_ticket"] = query.Get("pass_ticket")
} else {
// 尝试使用正则表达式提取参数
bizRegex := regexp.MustCompile(`__biz=([^&]+)`)
if match := bizRegex.FindStringSubmatch(accessToken); len(match) > 1 {
params["biz"] = match[1]
}
uinRegex := regexp.MustCompile(`uin=([^&]+)`)
if match := uinRegex.FindStringSubmatch(accessToken); len(match) > 1 {
params["uin"] = match[1]
}
keyRegex := regexp.MustCompile(`key=([^&]+)`)
if match := keyRegex.FindStringSubmatch(accessToken); len(match) > 1 {
params["key"] = match[1]
}
passTicketRegex := regexp.MustCompile(`pass_ticket=([^&]+)`)
if match := passTicketRegex.FindStringSubmatch(accessToken); len(match) > 1 {
params["pass_ticket"] = match[1]
}
}
// 验证必需参数
if params["biz"] == "" {
return nil, fmt.Errorf("缺少__biz参数")
}
if params["uin"] == "" {
return nil, fmt.Errorf("缺少uin参数")
}
if params["key"] == "" {
return nil, fmt.Errorf("缺少key参数")
}
if params["pass_ticket"] == "" {
return nil, fmt.Errorf("缺少pass_ticket参数")
}
return params, nil
}
// 获取数据列表
func getDataListHandler(w http.ResponseWriter, r *http.Request) {
dataDir := "../data"
@@ -541,3 +878,348 @@ func writeJSON(w http.ResponseWriter, data interface{}) {
w.Header().Set("Content-Type", "application/json; charset=utf-8")
json.NewEncoder(w).Encode(data)
}
// 生成随机Token
func generateToken() string {
b := make([]byte, 32)
rand.Read(b)
return hex.EncodeToString(b)
}
// 调用Python脚本
func callPythonScript(scriptPath string, args ...string) (string, error) {
// 构建Python命令
cmdArgs := append([]string{scriptPath}, args...)
cmd := exec.Command("python", cmdArgs...)
// 设置工作目录为数据库目录
dbDir, _ := filepath.Abs(filepath.Join("..", "..", "database"))
cmd.Dir = dbDir
// 执行命令
output, err := cmd.CombinedOutput()
if err != nil {
return "", fmt.Errorf("%s: %s", err, string(output))
}
return string(output), nil
}
// 用户注册处理
func registerHandler(w http.ResponseWriter, r *http.Request) {
if r.Method != "POST" {
writeJSON(w, Response{Success: false, Message: "仅支持POST请求", Code: 405})
return
}
var req RegisterRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
writeJSON(w, Response{Success: false, Message: "请求参数错误", Code: 400})
return
}
// 验证输入
if req.Username == "" || req.Password == "" || req.Email == "" {
writeJSON(w, Response{Success: false, Message: "用户名、密码和邮箱不能为空", Code: 400})
return
}
// 调用Python脚本创建用户
scriptPath := "user_cli.py"
args := []string{"create", req.Username, req.Password, req.Email}
output, err := callPythonScript(scriptPath, args...)
if err != nil {
log.Printf("注册失败: %v, 输出: %s", err, output)
// 判断错误类型
if strings.Contains(output, "用户名已存在") || strings.Contains(output, "邮箱已被注册") {
writeJSON(w, Response{Success: false, Message: "用户名或邮箱已存在", Code: 409})
} else if strings.Contains(output, "验证错误") {
writeJSON(w, Response{Success: false, Message: output, Code: 400})
} else {
writeJSON(w, Response{Success: false, Message: "注册失败", Code: 500})
}
return
}
log.Printf("用户注册成功: %s", req.Username)
writeJSON(w, Response{
Success: true,
Message: "注册成功",
Code: 200,
Data: map[string]interface{}{
"username": req.Username,
},
})
}
// 用户登录处理
func loginHandler(w http.ResponseWriter, r *http.Request) {
if r.Method != "POST" {
writeJSON(w, Response{Success: false, Message: "仅支持POST请求", Code: 405})
return
}
var req LoginRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
writeJSON(w, Response{Success: false, Message: "请求参数错误", Code: 400})
return
}
// 验证输入
if req.Username == "" || req.Password == "" {
writeJSON(w, Response{Success: false, Message: "用户名和密码不能为空", Code: 400})
return
}
// 调用Python脚本验证用户
scriptPath := "user_cli.py"
args := []string{"verify", req.Username, req.Password}
output, err := callPythonScript(scriptPath, args...)
log.Printf("🔍 Python输出: %s", output)
if err != nil {
log.Printf("❌ 登录失败: %v", err)
writeJSON(w, Response{Success: false, Message: "用户名或密码错误", Code: 401})
return
}
// 生成token
token := generateToken()
// 从输出中解析user_id和用户信息
var userData map[string]interface{}
if err := json.Unmarshal([]byte(output), &userData); err != nil {
log.Printf("❌ 解析用户数据失败: %v, 输出: %s", err, output)
writeJSON(w, Response{Success: false, Message: "服务器内部错误", Code: 500})
return
}
// 检查是否成功
if success, ok := userData["success"].(bool); !ok || !success {
log.Printf("❌ 用户验证失败: %v", userData)
writeJSON(w, Response{Success: false, Message: "用户名或密码错误", Code: 401})
return
}
userID := 0
if uid, ok := userData["user_id"].(float64); ok {
userID = int(uid)
}
// 存储session
sessions[token] = &Session{
Token: token,
UserID: userID,
Expiry: time.Now().Add(24 * time.Hour), // 24小时过期
}
log.Printf("✅ 用户登录成功: %s, token: %s", req.Username, token)
// 构建user_info不包含密码相关和success标记
userInfo := make(map[string]interface{})
for k, v := range userData {
if k != "password_hash" && k != "success" {
userInfo[k] = v
}
}
writeJSON(w, Response{
Success: true,
Message: "登录成功",
Code: 200,
Data: map[string]interface{}{
"token": token,
"user_id": userID,
"user_info": userInfo,
},
})
}
// 用户登出处理
func logoutHandler(w http.ResponseWriter, r *http.Request) {
if r.Method != "POST" {
writeJSON(w, Response{Success: false, Message: "仅支持POST请求", Code: 405})
return
}
// 从请求头中获取token
token := r.Header.Get("Authorization")
if token == "" {
var req struct {
Token string `json:"token"`
}
json.NewDecoder(r.Body).Decode(&req)
token = req.Token
}
if token == "" {
writeJSON(w, Response{Success: false, Message: "Token不能为空", Code: 400})
return
}
// 删除session
delete(sessions, token)
log.Printf("用户登出成功, token: %s", token)
writeJSON(w, Response{
Success: true,
Message: "登出成功",
Code: 200,
})
}
// 获取用户信息处理
func getUserInfoHandler(w http.ResponseWriter, r *http.Request) {
if r.Method != "GET" {
writeJSON(w, Response{Success: false, Message: "仅支持GET请求", Code: 405})
return
}
// 从请求头中获取token
token := r.Header.Get("Authorization")
if token == "" {
token = r.URL.Query().Get("token")
}
if token == "" {
writeJSON(w, Response{Success: false, Message: "Token不能为空", Code: 401})
return
}
// 验证session
session, ok := sessions[token]
if !ok || session.Expiry.Before(time.Now()) {
if ok {
delete(sessions, token) // 删除过期session
}
writeJSON(w, Response{Success: false, Message: "Token无效或已过期", Code: 401})
return
}
// 调用Python脚本获取用户信息
scriptPath := "user_cli.py"
args := []string{"get", fmt.Sprintf("%d", session.UserID)}
output, err := callPythonScript(scriptPath, args...)
if err != nil {
log.Printf("获取用户信息失败: %v", err)
writeJSON(w, Response{Success: false, Message: "获取用户信息失败", Code: 500})
return
}
// 解析用户信息
var userData map[string]interface{}
if err := json.Unmarshal([]byte(output), &userData); err != nil {
log.Printf("解析用户信息失败: %v", err)
writeJSON(w, Response{Success: false, Message: "解析用户信息失败", Code: 500})
return
}
// 删除密码哈希
delete(userData, "password_hash")
writeJSON(w, Response{
Success: true,
Message: "获取成功",
Code: 200,
Data: userData,
})
}
// 更新用户信息处理
func updateUserHandler(w http.ResponseWriter, r *http.Request) {
if r.Method != "POST" {
writeJSON(w, Response{Success: false, Message: "仅支持POST请求", Code: 405})
return
}
// 从请求头中获取token
token := r.Header.Get("Authorization")
if token == "" {
writeJSON(w, Response{Success: false, Message: "Token不能为空", Code: 401})
return
}
// 验证session
session, ok := sessions[token]
if !ok || session.Expiry.Before(time.Now()) {
if ok {
delete(sessions, token) // 删除过期session
}
writeJSON(w, Response{Success: false, Message: "Token无效或已过期", Code: 401})
return
}
// 解析请求体
var req struct {
UserID int `json:"user_id"`
Email string `json:"email"`
Bio string `json:"bio"`
}
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
log.Printf("❌ 解析请求体失败: %v", err)
writeJSON(w, Response{Success: false, Message: "请求参数错误", Code: 400})
return
}
log.Printf("🔍 更新用户信息: user_id=%d, email=%s", req.UserID, req.Email)
// 验证用户ID与session一致
if req.UserID != session.UserID {
log.Printf("❌ 用户ID不匹配: req=%d, session=%d", req.UserID, session.UserID)
writeJSON(w, Response{Success: false, Message: "无权操作", Code: 403})
return
}
// 调用Python脚本更新用户信息
scriptPath := "user_cli.py"
args := []string{"update", fmt.Sprintf("%d", req.UserID)}
// 添加需要更新的字段
if req.Email != "" {
args = append(args, "--email", req.Email)
}
if req.Bio != "" {
args = append(args, "--bio", req.Bio)
}
output, err := callPythonScript(scriptPath, args...)
log.Printf("🔍 Python输出: %s", output)
if err != nil {
log.Printf("❌ 更新用户信息失败: %v", err)
writeJSON(w, Response{Success: false, Message: "更新失败", Code: 500})
return
}
// 解析响应
var result map[string]interface{}
if err := json.Unmarshal([]byte(output), &result); err != nil {
log.Printf("❌ 解析响应失败: %v", err)
writeJSON(w, Response{Success: false, Message: "服务器内部错误", Code: 500})
return
}
// 检查是否成功
if success, ok := result["success"].(bool); !ok || !success {
errMsg := "更新失败"
if msg, ok := result["error"].(string); ok {
errMsg = msg
}
writeJSON(w, Response{Success: false, Message: errMsg, Code: 500})
return
}
log.Printf("✅ 用户信息更新成功: user_id=%d", req.UserID)
writeJSON(w, Response{
Success: true,
Message: "更新成功",
Code: 200,
})
}

View File

@@ -2,8 +2,8 @@
chcp 65001 >nul
title 微信公众号文章爬虫 - API服务器
:: 检查api_server.exe是否存在
if not exist "api_server.exe" (
:: 检查api-server.exe是否存在
if not exist "api-server.exe" (
echo ===============================================
echo ⚠️ API服务器未编译
echo ===============================================
@@ -20,4 +20,4 @@ if not exist "api_server.exe" (
:: 启动API服务器
cls
api_server.exe
api-server.exe

View File

@@ -205,7 +205,6 @@ func startCrawling(cfg *configs.Config) {
ReadCount: stats["read_num"],
LikeCount: stats["old_like_num"],
ShareCount: stats["share_num"],
ShowRead: stats["show_read"],
Comments: comments,
CommentLikes: commentLikes,
CommentID: commentID,

Binary file not shown.

22
backend/go.mod Normal file
View File

@@ -0,0 +1,22 @@
module github.com/wechat-crawler
go 1.24.0
require (
github.com/go-resty/resty/v2 v2.17.0
modernc.org/sqlite v1.40.1
)
require (
github.com/dustin/go-humanize v1.0.1 // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/mattn/go-isatty v0.0.20 // indirect
github.com/ncruces/go-strftime v0.1.9 // indirect
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
golang.org/x/exp v0.0.0-20250620022241-b7579e27df2b // indirect
golang.org/x/net v0.43.0 // indirect
golang.org/x/sys v0.36.0 // indirect
modernc.org/libc v1.66.10 // indirect
modernc.org/mathutil v1.7.1 // indirect
modernc.org/memory v1.11.0 // indirect
)

57
backend/go.sum Normal file
View File

@@ -0,0 +1,57 @@
github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY=
github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto=
github.com/go-resty/resty/v2 v2.17.0 h1:pW9DeXcaL4Rrym4EZ8v7L19zZiIlWPg5YXAcVmt+gN0=
github.com/go-resty/resty/v2 v2.17.0/go.mod h1:kCKZ3wWmwJaNc7S29BRtUhJwy7iqmn+2mLtQrOyQlVA=
github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e h1:ijClszYn+mADRFY17kjQEVQ1XRhq2/JR1M3sGqeJoxs=
github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e/go.mod h1:boTsfXsheKC2y+lKOCMpSfarhxDeIzfZG1jqGcPl3cA=
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
github.com/gorilla/mux v1.8.1 h1:TuBL49tXwgrFYWhqrNgrUNEY92u81SPhu7sTdzQEiWY=
github.com/gorilla/mux v1.8.1/go.mod h1:AKf9I4AEqPTmMytcMc0KkNouC66V3BtZ4qD5fmWSiMQ=
github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY=
github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
github.com/ncruces/go-strftime v0.1.9 h1:bY0MQC28UADQmHmaF5dgpLmImcShSi2kHU9XLdhx/f4=
github.com/ncruces/go-strftime v0.1.9/go.mod h1:Fwc5htZGVVkseilnfgOVb9mKy6w1naJmn9CehxcKcls=
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec h1:W09IVJc94icq4NjY3clb7Lk8O1qJ8BdBEF8z0ibU0rE=
github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec/go.mod h1:qqbHyh8v60DhA7CoWK5oRCqLrMHRGoxYCSS9EjAz6Eo=
golang.org/x/exp v0.0.0-20250620022241-b7579e27df2b h1:M2rDM6z3Fhozi9O7NWsxAkg/yqS/lQJ6PmkyIV3YP+o=
golang.org/x/exp v0.0.0-20250620022241-b7579e27df2b/go.mod h1:3//PLf8L/X+8b4vuAfHzxeRUl04Adcb341+IGKfnqS8=
golang.org/x/mod v0.27.0 h1:kb+q2PyFnEADO2IEF935ehFUXlWiNjJWtRNgBLSfbxQ=
golang.org/x/mod v0.27.0/go.mod h1:rWI627Fq0DEoudcK+MBkNkCe0EetEaDSwJJkCcjpazc=
golang.org/x/net v0.43.0 h1:lat02VYK2j4aLzMzecihNvTlJNQUq316m2Mr9rnM6YE=
golang.org/x/net v0.43.0/go.mod h1:vhO1fvI4dGsIjh73sWfUVjj3N7CA9WkKJNQm2svM6Jg=
golang.org/x/sync v0.16.0 h1:ycBJEhp9p4vXvUZNszeOq0kGTPghopOL8q0fq3vstxw=
golang.org/x/sync v0.16.0/go.mod h1:1dzgHSNfp02xaA81J2MS99Qcpr2w7fw1gpm99rleRqA=
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.36.0 h1:KVRy2GtZBrk1cBYA7MKu5bEZFxQk4NIDV6RLVcC8o0k=
golang.org/x/sys v0.36.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
golang.org/x/time v0.12.0 h1:ScB/8o8olJvc+CQPWrK3fPZNfh7qgwCrY0zJmoEQLSE=
golang.org/x/time v0.12.0/go.mod h1:CDIdPxbZBQxdj6cxyCIdrNogrJKMJ7pr37NYpMcMDSg=
golang.org/x/tools v0.36.0 h1:kWS0uv/zsvHEle1LbV5LE8QujrxB3wfQyxHfhOk0Qkg=
golang.org/x/tools v0.36.0/go.mod h1:WBDiHKJK8YgLHlcQPYQzNCkUxUypCaa5ZegCVutKm+s=
modernc.org/cc/v4 v4.26.5 h1:xM3bX7Mve6G8K8b+T11ReenJOT+BmVqQj0FY5T4+5Y4=
modernc.org/cc/v4 v4.26.5/go.mod h1:uVtb5OGqUKpoLWhqwNQo/8LwvoiEBLvZXIQ/SmO6mL0=
modernc.org/ccgo/v4 v4.28.1 h1:wPKYn5EC/mYTqBO373jKjvX2n+3+aK7+sICCv4Fjy1A=
modernc.org/ccgo/v4 v4.28.1/go.mod h1:uD+4RnfrVgE6ec9NGguUNdhqzNIeeomeXf6CL0GTE5Q=
modernc.org/fileutil v1.3.40 h1:ZGMswMNc9JOCrcrakF1HrvmergNLAmxOPjizirpfqBA=
modernc.org/fileutil v1.3.40/go.mod h1:HxmghZSZVAz/LXcMNwZPA/DRrQZEVP9VX0V4LQGQFOc=
modernc.org/gc/v2 v2.6.5 h1:nyqdV8q46KvTpZlsw66kWqwXRHdjIlJOhG6kxiV/9xI=
modernc.org/gc/v2 v2.6.5/go.mod h1:YgIahr1ypgfe7chRuJi2gD7DBQiKSLMPgBQe9oIiito=
modernc.org/goabi0 v0.2.0 h1:HvEowk7LxcPd0eq6mVOAEMai46V+i7Jrj13t4AzuNks=
modernc.org/goabi0 v0.2.0/go.mod h1:CEFRnnJhKvWT1c1JTI3Avm+tgOWbkOu5oPA8eH8LnMI=
modernc.org/libc v1.66.10 h1:yZkb3YeLx4oynyR+iUsXsybsX4Ubx7MQlSYEw4yj59A=
modernc.org/libc v1.66.10/go.mod h1:8vGSEwvoUoltr4dlywvHqjtAqHBaw0j1jI7iFBTAr2I=
modernc.org/mathutil v1.7.1 h1:GCZVGXdaN8gTqB1Mf/usp1Y/hSqgI2vAGGP4jZMCxOU=
modernc.org/mathutil v1.7.1/go.mod h1:4p5IwJITfppl0G4sUEDtCr4DthTaT47/N3aT6MhfgJg=
modernc.org/memory v1.11.0 h1:o4QC8aMQzmcwCK3t3Ux/ZHmwFPzE6hf2Y5LbkRs+hbI=
modernc.org/memory v1.11.0/go.mod h1:/JP4VbVC+K5sU2wZi9bHoq2MAkCnrt2r98UGeSK7Mjw=
modernc.org/opt v0.1.4 h1:2kNGMRiUjrp4LcaPuLY2PzUfqM/w9N23quVwhKt5Qm8=
modernc.org/opt v0.1.4/go.mod h1:03fq9lsNfvkYSfxrfUhZCWPk1lm4cq4N+Bh//bEtgns=
modernc.org/sortutil v1.2.1 h1:+xyoGf15mM3NMlPDnFqrteY07klSFxLElE2PVuWIJ7w=
modernc.org/sortutil v1.2.1/go.mod h1:7ZI3a3REbai7gzCLcotuw9AC4VZVpYMjDzETGsSMqJE=
modernc.org/sqlite v1.40.1 h1:VfuXcxcUWWKRBuP8+BR9L7VnmusMgBNNnBYGEe9w/iY=
modernc.org/sqlite v1.40.1/go.mod h1:9fjQZ0mB1LLP0GYrp39oOJXx/I2sxEnZtzCmEQIKvGE=
modernc.org/strutil v1.2.1 h1:UneZBkQA+DX2Rp35KcM69cSsNES9ly8mQWD71HKlOA0=
modernc.org/strutil v1.2.1/go.mod h1:EHkiggD70koQxjVdSBM3JKM7k6L0FbGE5eymy9i3B9A=
modernc.org/token v1.1.0 h1:Xl7Ap9dKaEs5kLoOQeQmPWevfnk/DM5qcLcYlA8ys6Y=
modernc.org/token v1.1.0/go.mod h1:UGzOrNV1mAFSEB63lOFHIpNRUVMvYTc6yu1SMY/XTDM=

Binary file not shown.

View File

@@ -25,7 +25,6 @@ type ArticleDetail struct {
ReadCount string `json:"read_count"`
LikeCount string `json:"like_count"`
ShareCount string `json:"share_count"`
ShowRead string `json:"show_read"`
Comments []string `json:"comments"`
CommentLikes []string `json:"comment_likes"`
CommentID string `json:"comment_id"`
@@ -1624,7 +1623,6 @@ func (w *WechatCrawler) GetArticleDetail(link string) (*ArticleDetail, error) {
ReadCount: stats["read_num"],
LikeCount: stats["old_like_num"],
ShareCount: stats["share_num"],
ShowRead: stats["show_read"],
Comments: comments,
CommentLikes: commentLikes,
CommentID: commentID,
@@ -1731,7 +1729,6 @@ func (c *WechatCrawler) SaveArticleDetailToExcel(article *ArticleDetail, filePat
content.WriteString(fmt.Sprintf("阅读量: %s\n", article.ReadCount))
content.WriteString(fmt.Sprintf("点赞数: %s\n", article.LikeCount))
content.WriteString(fmt.Sprintf("转发数: %s\n", article.ShareCount))
content.WriteString(fmt.Sprintf("在看数: %s\n", article.ShowRead))
content.WriteString(strings.Repeat("=", 80))
content.WriteString("\n\n")

BIN
backend/wechat-crawler.exe Normal file

Binary file not shown.