新版可用
This commit is contained in:
460
backend/api/API接口文档.md
Normal file
460
backend/api/API接口文档.md
Normal file
@@ -0,0 +1,460 @@
|
||||
# 📡 微信公众号文章爬虫 - API 接口文档
|
||||
|
||||
## 服务器信息
|
||||
|
||||
- **服务地址**: http://localhost:8080
|
||||
- **协议**: HTTP/1.1
|
||||
- **数据格式**: JSON
|
||||
- **字符编码**: UTF-8
|
||||
- **CORS**: 已启用(允许所有来源)
|
||||
|
||||
## 统一响应格式
|
||||
|
||||
所有API接口返回格式统一为:
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true, // 请求是否成功
|
||||
"message": "操作成功", // 提示信息
|
||||
"data": {} // 数据内容(可选)
|
||||
}
|
||||
```
|
||||
|
||||
## 接口列表
|
||||
|
||||
### 1. 提取公众号主页
|
||||
|
||||
**接口地址**: `/api/homepage/extract`
|
||||
**请求方法**: POST
|
||||
**功能说明**: 从文章链接中提取公众号主页链接
|
||||
|
||||
#### 请求参数
|
||||
|
||||
```json
|
||||
{
|
||||
"url": "https://mp.weixin.qq.com/s?__biz=xxx&mid=xxx"
|
||||
}
|
||||
```
|
||||
|
||||
| 参数 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| url | string | 是 | 公众号文章链接 |
|
||||
|
||||
#### 响应示例
|
||||
|
||||
**成功响应**:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "提取成功",
|
||||
"data": {
|
||||
"homepage": "https://mp.weixin.qq.com/mp/profile_ext?action=home&__biz=xxx&scene=124",
|
||||
"output": "完整的命令行输出信息"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**失败响应**:
|
||||
```json
|
||||
{
|
||||
"success": false,
|
||||
"message": "未能提取到主页链接"
|
||||
}
|
||||
```
|
||||
|
||||
#### 调用示例
|
||||
|
||||
**jQuery**:
|
||||
```javascript
|
||||
$.ajax({
|
||||
url: 'http://localhost:8080/api/homepage/extract',
|
||||
method: 'POST',
|
||||
contentType: 'application/json',
|
||||
data: JSON.stringify({
|
||||
url: 'https://mp.weixin.qq.com/s?__biz=xxx&mid=xxx'
|
||||
}),
|
||||
success: function(response) {
|
||||
if (response.success) {
|
||||
console.log('主页链接:', response.data.homepage);
|
||||
}
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**curl**:
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/api/homepage/extract \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"url":"https://mp.weixin.qq.com/s?__biz=xxx&mid=xxx"}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. 下载单篇文章
|
||||
|
||||
**接口地址**: `/api/article/download`
|
||||
**请求方法**: POST
|
||||
**功能说明**: 下载指定的单篇文章
|
||||
|
||||
#### 请求参数
|
||||
|
||||
```json
|
||||
{
|
||||
"url": "https://mp.weixin.qq.com/s?__biz=xxx",
|
||||
"save_image": true,
|
||||
"save_content": true
|
||||
}
|
||||
```
|
||||
|
||||
| 参数 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| url | string | 是 | 文章链接 |
|
||||
| save_image | boolean | 否 | 是否保存图片(默认false) |
|
||||
| save_content | boolean | 否 | 是否保存内容(默认true) |
|
||||
|
||||
#### 响应示例
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "下载任务已启动",
|
||||
"data": {
|
||||
"url": "https://mp.weixin.qq.com/s?__biz=xxx"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. 获取文章列表
|
||||
|
||||
**接口地址**: `/api/article/list`
|
||||
**请求方法**: POST
|
||||
**功能说明**: 批量获取公众号的文章列表
|
||||
|
||||
#### 请求参数
|
||||
|
||||
```json
|
||||
{
|
||||
"access_token": "https://mp.weixin.qq.com/mp/profile_ext?action=xxx&appmsg_token=xxx",
|
||||
"pages": 0
|
||||
}
|
||||
```
|
||||
|
||||
| 参数 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| access_token | string | 是 | 包含appmsg_token的URL |
|
||||
| pages | integer | 否 | 获取页数,0表示全部(默认0) |
|
||||
|
||||
#### 响应示例
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "任务已启动"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. 批量下载文章
|
||||
|
||||
**接口地址**: `/api/article/batch`
|
||||
**请求方法**: POST
|
||||
**功能说明**: 批量下载公众号的所有文章
|
||||
|
||||
#### 请求参数
|
||||
|
||||
```json
|
||||
{
|
||||
"official_account": "公众号名称或文章链接",
|
||||
"save_image": true,
|
||||
"save_content": true
|
||||
}
|
||||
```
|
||||
|
||||
| 参数 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| official_account | string | 是 | 公众号名称或任意文章链接 |
|
||||
| save_image | boolean | 否 | 是否保存图片(默认false) |
|
||||
| save_content | boolean | 否 | 是否保存内容(默认true) |
|
||||
|
||||
#### 响应示例
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "任务已启动"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 5. 获取数据列表
|
||||
|
||||
**接口地址**: `/api/data/list`
|
||||
**请求方法**: GET
|
||||
**功能说明**: 获取已下载的公众号数据列表
|
||||
|
||||
#### 请求参数
|
||||
|
||||
无
|
||||
|
||||
#### 响应示例
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"data": [
|
||||
{
|
||||
"name": "研招网资讯",
|
||||
"article_count": 125,
|
||||
"path": "D:\\workspace\\Access_wechat_article\\backend\\data\\研招网资讯",
|
||||
"last_update": "2025-11-27"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
| 字段 | 类型 | 说明 |
|
||||
|------|------|------|
|
||||
| name | string | 公众号名称 |
|
||||
| article_count | integer | 文章数量 |
|
||||
| path | string | 存储路径 |
|
||||
| last_update | string | 最后更新时间 |
|
||||
|
||||
#### 调用示例
|
||||
|
||||
**jQuery**:
|
||||
```javascript
|
||||
$.get('http://localhost:8080/api/data/list', function(response) {
|
||||
if (response.success) {
|
||||
console.log('数据列表:', response.data);
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**curl**:
|
||||
```bash
|
||||
curl http://localhost:8080/api/data/list
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 6. 获取任务状态
|
||||
|
||||
**接口地址**: `/api/task/status`
|
||||
**请求方法**: GET
|
||||
**功能说明**: 获取当前任务的执行状态
|
||||
|
||||
#### 请求参数
|
||||
|
||||
无
|
||||
|
||||
#### 响应示例
|
||||
|
||||
**任务运行中**:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"data": {
|
||||
"running": true,
|
||||
"progress": 45,
|
||||
"message": "正在下载第10篇文章..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**无任务运行**:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"data": {
|
||||
"running": false,
|
||||
"progress": 0,
|
||||
"message": ""
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| 字段 | 类型 | 说明 |
|
||||
|------|------|------|
|
||||
| running | boolean | 是否有任务运行中 |
|
||||
| progress | integer | 任务进度(0-100) |
|
||||
| message | string | 任务状态描述 |
|
||||
| error | string | 错误信息(可选) |
|
||||
|
||||
---
|
||||
|
||||
## 错误码说明
|
||||
|
||||
### HTTP状态码
|
||||
|
||||
| 状态码 | 说明 |
|
||||
|--------|------|
|
||||
| 200 | 请求成功 |
|
||||
| 400 | 请求参数错误 |
|
||||
| 500 | 服务器内部错误 |
|
||||
|
||||
### 业务错误码
|
||||
|
||||
所有业务错误通过响应中的 `success` 字段和 `message` 字段返回:
|
||||
|
||||
```json
|
||||
{
|
||||
"success": false,
|
||||
"message": "具体的错误信息"
|
||||
}
|
||||
```
|
||||
|
||||
常见错误信息:
|
||||
|
||||
| 错误信息 | 说明 | 解决方法 |
|
||||
|----------|------|----------|
|
||||
| 请求参数错误 | JSON格式不正确或缺少必填参数 | 检查请求参数格式 |
|
||||
| 执行失败 | 后端程序执行出错 | 查看详细错误信息 |
|
||||
| 未能提取到主页链接 | 文章链接格式错误或解析失败 | 使用有效的文章链接 |
|
||||
| 读取数据目录失败 | data目录不存在或无权限 | 检查目录权限 |
|
||||
|
||||
---
|
||||
|
||||
## 开发指南
|
||||
|
||||
### 本地测试
|
||||
|
||||
1. **启动API服务器**:
|
||||
```bash
|
||||
cd backend\api
|
||||
start_api.bat
|
||||
```
|
||||
|
||||
2. **测试接口**:
|
||||
```bash
|
||||
# 测试提取主页
|
||||
curl -X POST http://localhost:8080/api/homepage/extract \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{\"url\":\"文章链接\"}"
|
||||
|
||||
# 测试获取数据列表
|
||||
curl http://localhost:8080/api/data/list
|
||||
```
|
||||
|
||||
### 跨域配置
|
||||
|
||||
API服务器已启用CORS,允许所有来源访问:
|
||||
|
||||
```go
|
||||
w.Header().Set("Access-Control-Allow-Origin", "*")
|
||||
w.Header().Set("Access-Control-Allow-Methods", "GET, POST, OPTIONS")
|
||||
w.Header().Set("Access-Control-Allow-Headers", "Content-Type")
|
||||
```
|
||||
|
||||
如需限制特定域名,修改 `server.go` 中的 `corsMiddleware` 函数。
|
||||
|
||||
### 超时设置
|
||||
|
||||
默认HTTP超时时间:30秒
|
||||
|
||||
如需修改,在 `server.go` 中添加:
|
||||
|
||||
```go
|
||||
server := &http.Server{
|
||||
Addr: ":8080",
|
||||
ReadTimeout: 30 * time.Second,
|
||||
WriteTimeout: 30 * time.Second,
|
||||
}
|
||||
```
|
||||
|
||||
### 日志记录
|
||||
|
||||
API服务器使用标准输出记录日志:
|
||||
|
||||
```go
|
||||
log.Printf("[%s] %s - %s", r.Method, r.URL.Path, message)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 接口更新计划
|
||||
|
||||
### v1.1.0(计划中)
|
||||
- [ ] 添加用户认证机制
|
||||
- [ ] 支持任务队列管理
|
||||
- [ ] 增加下载进度推送(WebSocket)
|
||||
- [ ] 提供文章搜索接口
|
||||
|
||||
### v1.2.0(计划中)
|
||||
- [ ] 数据统计分析接口
|
||||
- [ ] 导出功能(PDF/Word)
|
||||
- [ ] 批量任务管理
|
||||
- [ ] 定时任务支持
|
||||
|
||||
---
|
||||
|
||||
## 技术栈
|
||||
|
||||
- **语言**: Go 1.20+
|
||||
- **Web框架**: net/http (标准库)
|
||||
- **数据格式**: JSON
|
||||
- **并发模型**: Goroutine
|
||||
|
||||
---
|
||||
|
||||
## 性能说明
|
||||
|
||||
### 并发能力
|
||||
- 支持多客户端同时访问
|
||||
- 但同一时间只能执行一个爬虫任务(`currentTask`)
|
||||
|
||||
### 资源占用
|
||||
- CPU: 低(主要I/O操作)
|
||||
- 内存: <50MB
|
||||
- 磁盘: 取决于下载的文章数量
|
||||
|
||||
### 性能优化建议
|
||||
1. 使用连接池管理HTTP请求
|
||||
2. 实现任务队列机制
|
||||
3. 添加结果缓存
|
||||
4. 启用gzip压缩
|
||||
|
||||
---
|
||||
|
||||
## 安全建议
|
||||
|
||||
### 1. 生产环境部署
|
||||
- 添加HTTPS支持
|
||||
- 实现API认证(JWT/OAuth)
|
||||
- 限制跨域来源
|
||||
- 添加请求频率限制
|
||||
|
||||
### 2. 数据安全
|
||||
- 不要暴露敏感信息(Cookie)
|
||||
- 定期清理临时文件
|
||||
- 备份重要数据
|
||||
|
||||
### 3. 访问控制
|
||||
- 添加IP白名单
|
||||
- 实现用户权限管理
|
||||
- 记录操作日志
|
||||
|
||||
---
|
||||
|
||||
## 常见问题
|
||||
|
||||
### Q1: 为什么任务启动后没有响应?
|
||||
A: 检查后端 `wechat-crawler.exe` 是否存在并有执行权限。
|
||||
|
||||
### Q2: 如何查看详细的错误信息?
|
||||
A: 查看API服务器窗口的控制台输出。
|
||||
|
||||
### Q3: 能同时执行多个下载任务吗?
|
||||
A: 当前版本不支持,同时只能执行一个任务。
|
||||
|
||||
### Q4: 如何停止正在运行的任务?
|
||||
A: 关闭API服务器窗口或重启服务器。
|
||||
|
||||
---
|
||||
|
||||
**文档版本**: v1.0.0
|
||||
**最后更新**: 2025-11-27
|
||||
**维护者**: AI Assistant
|
||||
26
backend/api/build.bat
Normal file
26
backend/api/build.bat
Normal file
@@ -0,0 +1,26 @@
|
||||
@echo off
|
||||
chcp 65001 >nul
|
||||
echo ===============================================
|
||||
echo 📦 编译 API 服务器
|
||||
echo ===============================================
|
||||
echo.
|
||||
|
||||
echo 🔨 正在编译 api_server.exe...
|
||||
go build -o api_server.exe server.go
|
||||
|
||||
if %errorlevel% neq 0 (
|
||||
echo.
|
||||
echo ❌ 编译失败!
|
||||
echo.
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
echo.
|
||||
echo ✅ 编译成功!
|
||||
echo 📁 输出文件: api_server.exe
|
||||
echo.
|
||||
echo ===============================================
|
||||
echo 编译完成
|
||||
echo ===============================================
|
||||
pause
|
||||
543
backend/api/server.go
Normal file
543
backend/api/server.go
Normal file
@@ -0,0 +1,543 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"log"
|
||||
"net/http"
|
||||
"os"
|
||||
"os/exec"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"time"
|
||||
)
|
||||
|
||||
// Response 统一响应结构
|
||||
type Response struct {
|
||||
Success bool `json:"success"`
|
||||
Message string `json:"message"`
|
||||
Data interface{} `json:"data,omitempty"`
|
||||
}
|
||||
|
||||
// 任务状态
|
||||
type TaskStatus struct {
|
||||
Running bool `json:"running"`
|
||||
Progress int `json:"progress"`
|
||||
Message string `json:"message"`
|
||||
Error string `json:"error,omitempty"`
|
||||
}
|
||||
|
||||
var currentTask = &TaskStatus{Running: false}
|
||||
|
||||
func main() {
|
||||
// 启用CORS
|
||||
http.HandleFunc("/", corsMiddleware(handleRoot))
|
||||
http.HandleFunc("/api/homepage/extract", corsMiddleware(extractHomepageHandler))
|
||||
http.HandleFunc("/api/article/download", corsMiddleware(downloadArticleHandler))
|
||||
http.HandleFunc("/api/article/list", corsMiddleware(getArticleListHandler))
|
||||
http.HandleFunc("/api/article/batch", corsMiddleware(batchDownloadHandler))
|
||||
http.HandleFunc("/api/data/list", corsMiddleware(getDataListHandler))
|
||||
http.HandleFunc("/api/task/status", corsMiddleware(getTaskStatusHandler))
|
||||
http.HandleFunc("/api/download/", corsMiddleware(downloadFileHandler))
|
||||
|
||||
port := ":8080"
|
||||
fmt.Println("===============================================")
|
||||
fmt.Println(" 🚀 微信公众号文章爬虫 API 服务器")
|
||||
fmt.Println("===============================================")
|
||||
fmt.Printf("🌐 服务地址: http://localhost%s\n", port)
|
||||
fmt.Printf("⏰ 启动时间: %s\n", time.Now().Format("2006-01-02 15:04:05"))
|
||||
fmt.Println("===============================================\n")
|
||||
|
||||
if err := http.ListenAndServe(port, nil); err != nil {
|
||||
log.Fatal("服务器启动失败:", err)
|
||||
}
|
||||
}
|
||||
|
||||
// CORS中间件
|
||||
func corsMiddleware(next http.HandlerFunc) http.HandlerFunc {
|
||||
return func(w http.ResponseWriter, r *http.Request) {
|
||||
w.Header().Set("Access-Control-Allow-Origin", "*")
|
||||
w.Header().Set("Access-Control-Allow-Methods", "GET, POST, OPTIONS")
|
||||
w.Header().Set("Access-Control-Allow-Headers", "Content-Type")
|
||||
|
||||
if r.Method == "OPTIONS" {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
return
|
||||
}
|
||||
|
||||
next(w, r)
|
||||
}
|
||||
}
|
||||
|
||||
// 首页处理
|
||||
func handleRoot(w http.ResponseWriter, r *http.Request) {
|
||||
w.Header().Set("Content-Type", "text/html; charset=utf-8")
|
||||
html := `
|
||||
<!DOCTYPE html>
|
||||
<html>
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<title>微信公众号文章爬虫 API</title>
|
||||
<style>
|
||||
body { font-family: Arial, sans-serif; max-width: 800px; margin: 50px auto; padding: 20px; }
|
||||
h1 { color: #333; }
|
||||
.endpoint { background: #f5f5f5; padding: 10px; margin: 10px 0; border-radius: 5px; }
|
||||
.method { color: #4CAF50; font-weight: bold; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>🚀 微信公众号文章爬虫 API 服务器</h1>
|
||||
<p>当前时间: ` + time.Now().Format("2006-01-02 15:04:05") + `</p>
|
||||
<h2>可用接口:</h2>
|
||||
<div class="endpoint">
|
||||
<span class="method">POST</span> /api/homepage/extract - 提取公众号主页
|
||||
</div>
|
||||
<div class="endpoint">
|
||||
<span class="method">POST</span> /api/article/download - 下载单篇文章
|
||||
</div>
|
||||
<div class="endpoint">
|
||||
<span class="method">POST</span> /api/article/list - 获取文章列表
|
||||
</div>
|
||||
<div class="endpoint">
|
||||
<span class="method">POST</span> /api/article/batch - 批量下载文章
|
||||
</div>
|
||||
<div class="endpoint">
|
||||
<span class="method">GET</span> /api/data/list - 获取数据列表
|
||||
</div>
|
||||
<div class="endpoint">
|
||||
<span class="method">GET</span> /api/task/status - 获取任务状态
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
`
|
||||
w.Write([]byte(html))
|
||||
}
|
||||
|
||||
// 提取公众号主页
|
||||
func extractHomepageHandler(w http.ResponseWriter, r *http.Request) {
|
||||
var req struct {
|
||||
URL string `json:"url"`
|
||||
}
|
||||
|
||||
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
||||
writeJSON(w, Response{Success: false, Message: "请求参数错误"})
|
||||
return
|
||||
}
|
||||
|
||||
// 执行命令(使用绝对路径)
|
||||
exePath := filepath.Join("..", "wechat-crawler.exe")
|
||||
absPath, _ := filepath.Abs(exePath)
|
||||
log.Printf("尝试执行: %s", absPath)
|
||||
|
||||
cmd := exec.Command(absPath, req.URL)
|
||||
workDir, _ := filepath.Abs("..")
|
||||
cmd.Dir = workDir
|
||||
output, err := cmd.CombinedOutput()
|
||||
|
||||
if err != nil {
|
||||
log.Printf("执行失败: %v, 输出: %s", err, string(output))
|
||||
writeJSON(w, Response{Success: false, Message: "执行失败: " + string(output)})
|
||||
return
|
||||
}
|
||||
|
||||
// 从输出中提取公众号主页链接
|
||||
outputStr := string(output)
|
||||
lines := strings.Split(outputStr, "\n")
|
||||
var homepageURL string
|
||||
|
||||
for _, line := range lines {
|
||||
if strings.Contains(line, "公众号主页链接") || strings.Contains(line, "https://mp.weixin.qq.com/mp/profile_ext") {
|
||||
// 提取URL
|
||||
if idx := strings.Index(line, "https://"); idx != -1 {
|
||||
homepageURL = strings.TrimSpace(line[idx:])
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if homepageURL == "" {
|
||||
writeJSON(w, Response{Success: false, Message: "未能提取到主页链接"})
|
||||
return
|
||||
}
|
||||
|
||||
writeJSON(w, Response{
|
||||
Success: true,
|
||||
Message: "提取成功",
|
||||
Data: map[string]string{
|
||||
"homepage": homepageURL,
|
||||
"output": outputStr,
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
// 下载单篇文章(这里需要实现具体逻辑)
|
||||
func downloadArticleHandler(w http.ResponseWriter, r *http.Request) {
|
||||
var req struct {
|
||||
URL string `json:"url"`
|
||||
SaveImage bool `json:"save_image"`
|
||||
SaveContent bool `json:"save_content"`
|
||||
}
|
||||
|
||||
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
||||
writeJSON(w, Response{Success: false, Message: "请求参数错误"})
|
||||
return
|
||||
}
|
||||
|
||||
currentTask.Running = true
|
||||
currentTask.Progress = 0
|
||||
currentTask.Message = "正在下载文章..."
|
||||
|
||||
// 注意:这里需要实际调用爬虫的下载功能
|
||||
// 由于当前后端程序没有单独的下载单篇文章的命令行接口
|
||||
// 需要后续实现或使用其他方式
|
||||
|
||||
writeJSON(w, Response{
|
||||
Success: true,
|
||||
Message: "下载任务已启动",
|
||||
Data: map[string]interface{}{
|
||||
"url": req.URL,
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
// 获取文章列表
|
||||
func getArticleListHandler(w http.ResponseWriter, r *http.Request) {
|
||||
var req struct {
|
||||
AccessToken string `json:"access_token"`
|
||||
Pages int `json:"pages"`
|
||||
}
|
||||
|
||||
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
||||
writeJSON(w, Response{Success: false, Message: "请求参数错误"})
|
||||
return
|
||||
}
|
||||
|
||||
currentTask.Running = true
|
||||
currentTask.Progress = 0
|
||||
currentTask.Message = "正在获取文章列表..."
|
||||
|
||||
// 同步执行爬虫程序(功能3)
|
||||
exePath := filepath.Join("..", "wechat-crawler.exe")
|
||||
absPath, _ := filepath.Abs(exePath)
|
||||
workDir, _ := filepath.Abs("..")
|
||||
|
||||
log.Printf("启动功能3: %s, 工作目录: %s", absPath, workDir)
|
||||
cmd := exec.Command(absPath)
|
||||
cmd.Dir = workDir
|
||||
|
||||
// 创建输入管道
|
||||
stdin, err := cmd.StdinPipe()
|
||||
if err != nil {
|
||||
log.Printf("创建输入管道失败: %v", err)
|
||||
currentTask.Running = false
|
||||
writeJSON(w, Response{Success: false, Message: "创建输入管道失败: " + err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
// 启动命令
|
||||
if err := cmd.Start(); err != nil {
|
||||
log.Printf("启动命令失败: %v", err)
|
||||
currentTask.Running = false
|
||||
writeJSON(w, Response{Success: false, Message: "启动命令失败: " + err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
// 发送选项"3"(功能3:通过access_token获取文章列表)
|
||||
fmt.Fprintln(stdin, "3")
|
||||
fmt.Fprintln(stdin, req.AccessToken)
|
||||
if req.Pages > 0 {
|
||||
fmt.Fprintf(stdin, "%d\n", req.Pages)
|
||||
} else {
|
||||
fmt.Fprintln(stdin, "0")
|
||||
}
|
||||
stdin.Close()
|
||||
|
||||
// 等待命令完成
|
||||
if err := cmd.Wait(); err != nil {
|
||||
log.Printf("命令执行失败: %v", err)
|
||||
currentTask.Running = false
|
||||
writeJSON(w, Response{Success: false, Message: "命令执行失败: " + err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
currentTask.Running = false
|
||||
currentTask.Progress = 100
|
||||
currentTask.Message = "文章列表获取完成"
|
||||
|
||||
// 查找生成的文件并返回下载链接
|
||||
dataDir := "../data"
|
||||
entries, err := os.ReadDir(dataDir)
|
||||
if err != nil {
|
||||
writeJSON(w, Response{Success: false, Message: "读取数据目录失败: " + err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
// 查找最新创建的公众号目录
|
||||
var latestDir string
|
||||
var latestTime time.Time
|
||||
for _, entry := range entries {
|
||||
if entry.IsDir() && entry.Name() != "." && entry.Name() != ".." {
|
||||
info, _ := entry.Info()
|
||||
if info.ModTime().After(latestTime) {
|
||||
latestTime = info.ModTime()
|
||||
latestDir = entry.Name()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if latestDir == "" {
|
||||
writeJSON(w, Response{Success: false, Message: "未找到生成的数据目录"})
|
||||
return
|
||||
}
|
||||
|
||||
log.Printf("找到最新目录: %s", latestDir)
|
||||
|
||||
// 查找文章列表文件(优先查找直连链接文件)
|
||||
accountPath := filepath.Join(dataDir, latestDir)
|
||||
files, err := os.ReadDir(accountPath)
|
||||
if err != nil {
|
||||
writeJSON(w, Response{Success: false, Message: "读取公众号目录失败: " + err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
var excelFile string
|
||||
// 优先查找直连链接文件(.xlsx或.txt)
|
||||
for _, file := range files {
|
||||
if !file.IsDir() && strings.Contains(file.Name(), "直连链接") {
|
||||
if strings.HasSuffix(file.Name(), ".xlsx") || strings.HasSuffix(file.Name(), ".txt") {
|
||||
excelFile = file.Name()
|
||||
log.Printf("找到直连链接文件: %s", excelFile)
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// 如果没有直连链接文件,查找原始链接文件
|
||||
if excelFile == "" {
|
||||
for _, file := range files {
|
||||
if !file.IsDir() && strings.Contains(file.Name(), "原始链接") {
|
||||
if strings.HasSuffix(file.Name(), ".xlsx") || strings.HasSuffix(file.Name(), ".txt") {
|
||||
excelFile = file.Name()
|
||||
log.Printf("找到原始链接文件: %s", excelFile)
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// 如果还是没有,查找任何文章列表文件
|
||||
if excelFile == "" {
|
||||
for _, file := range files {
|
||||
if !file.IsDir() && strings.Contains(file.Name(), "文章列表") {
|
||||
if strings.HasSuffix(file.Name(), ".xlsx") || strings.HasSuffix(file.Name(), ".txt") {
|
||||
excelFile = file.Name()
|
||||
log.Printf("找到文章列表文件: %s", excelFile)
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if excelFile == "" {
|
||||
// 列出所有文件用于调试
|
||||
var fileList []string
|
||||
for _, file := range files {
|
||||
fileList = append(fileList, file.Name())
|
||||
}
|
||||
log.Printf("目录 %s 中的文件: %v", latestDir, fileList)
|
||||
writeJSON(w, Response{Success: false, Message: "未找到Excel文件,目录中的文件: " + strings.Join(fileList, ", ")})
|
||||
return
|
||||
}
|
||||
|
||||
writeJSON(w, Response{
|
||||
Success: true,
|
||||
Message: "文章列表获取成功",
|
||||
Data: map[string]interface{}{
|
||||
"account": latestDir,
|
||||
"filename": excelFile,
|
||||
"download": fmt.Sprintf("/download/%s/%s", latestDir, excelFile),
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
// 批量下载文章
|
||||
func batchDownloadHandler(w http.ResponseWriter, r *http.Request) {
|
||||
var req struct {
|
||||
OfficialAccount string `json:"official_account"`
|
||||
SaveImage bool `json:"save_image"`
|
||||
SaveContent bool `json:"save_content"`
|
||||
}
|
||||
|
||||
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
||||
writeJSON(w, Response{Success: false, Message: "请求参数错误"})
|
||||
return
|
||||
}
|
||||
|
||||
currentTask.Running = true
|
||||
currentTask.Progress = 0
|
||||
currentTask.Message = "正在批量下载文章..."
|
||||
|
||||
// 同步执行爬虫程序(功能5)
|
||||
exePath := filepath.Join("..", "wechat-crawler.exe")
|
||||
absPath, _ := filepath.Abs(exePath)
|
||||
workDir, _ := filepath.Abs("..")
|
||||
|
||||
log.Printf("启动功能5: %s, 工作目录: %s", absPath, workDir)
|
||||
cmd := exec.Command(absPath)
|
||||
cmd.Dir = workDir
|
||||
|
||||
// 创建输入管道
|
||||
stdin, err := cmd.StdinPipe()
|
||||
if err != nil {
|
||||
log.Printf("创建输入管道失败: %v", err)
|
||||
currentTask.Running = false
|
||||
writeJSON(w, Response{Success: false, Message: "创建输入管道失败: " + err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
// 启动命令
|
||||
if err := cmd.Start(); err != nil {
|
||||
log.Printf("启动命令失败: %v", err)
|
||||
currentTask.Running = false
|
||||
writeJSON(w, Response{Success: false, Message: "启动命令失败: " + err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
// 发送选项"5"(功能5:批量下载)
|
||||
fmt.Fprintln(stdin, "5")
|
||||
fmt.Fprintln(stdin, req.OfficialAccount)
|
||||
|
||||
// 是否保存图片
|
||||
if req.SaveImage {
|
||||
fmt.Fprintln(stdin, "y")
|
||||
} else {
|
||||
fmt.Fprintln(stdin, "n")
|
||||
}
|
||||
stdin.Close()
|
||||
|
||||
// 等待命令完成
|
||||
if err := cmd.Wait(); err != nil {
|
||||
log.Printf("命令执行失败: %v", err)
|
||||
currentTask.Running = false
|
||||
writeJSON(w, Response{Success: false, Message: "命令执行失败: " + err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
currentTask.Running = false
|
||||
currentTask.Progress = 100
|
||||
currentTask.Message = "批量下载完成"
|
||||
|
||||
// 统计下载的文章数量
|
||||
accountPath := filepath.Join("../data", req.OfficialAccount, "文章详细")
|
||||
var articleCount int
|
||||
if entries, err := os.ReadDir(accountPath); err == nil {
|
||||
articleCount = len(entries)
|
||||
}
|
||||
|
||||
writeJSON(w, Response{
|
||||
Success: true,
|
||||
Message: fmt.Sprintf("批量下载完成,共下载 %d 篇文章", articleCount),
|
||||
Data: map[string]interface{}{
|
||||
"account": req.OfficialAccount,
|
||||
"articleCount": articleCount,
|
||||
"path": accountPath,
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
// 获取数据列表
|
||||
func getDataListHandler(w http.ResponseWriter, r *http.Request) {
|
||||
dataDir := "../data"
|
||||
var accounts []map[string]interface{}
|
||||
|
||||
entries, err := os.ReadDir(dataDir)
|
||||
if err != nil {
|
||||
// 如果目录不存在,返回空列表而不是错误
|
||||
writeJSON(w, Response{
|
||||
Success: true,
|
||||
Data: accounts,
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
for _, entry := range entries {
|
||||
if entry.IsDir() {
|
||||
accountPath := filepath.Join(dataDir, entry.Name())
|
||||
|
||||
// 统计文章数量
|
||||
detailPath := filepath.Join(accountPath, "文章详细")
|
||||
var articleCount int
|
||||
if detailEntries, err := os.ReadDir(detailPath); err == nil {
|
||||
articleCount = len(detailEntries)
|
||||
}
|
||||
|
||||
// 获取最后更新时间
|
||||
info, _ := entry.Info()
|
||||
lastUpdate := info.ModTime().Format("2006-01-02")
|
||||
|
||||
accounts = append(accounts, map[string]interface{}{
|
||||
"name": entry.Name(),
|
||||
"articleCount": articleCount,
|
||||
"path": accountPath,
|
||||
"lastUpdate": lastUpdate,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
writeJSON(w, Response{
|
||||
Success: true,
|
||||
Data: accounts,
|
||||
})
|
||||
}
|
||||
|
||||
// 获取任务状态
|
||||
func getTaskStatusHandler(w http.ResponseWriter, r *http.Request) {
|
||||
writeJSON(w, Response{
|
||||
Success: true,
|
||||
Data: currentTask,
|
||||
})
|
||||
}
|
||||
|
||||
// 下载文件处理
|
||||
func downloadFileHandler(w http.ResponseWriter, r *http.Request) {
|
||||
// 从 URL 中提取路径 /api/download/公众号名称/文件名
|
||||
path := strings.TrimPrefix(r.URL.Path, "/api/download/")
|
||||
parts := strings.SplitN(path, "/", 2)
|
||||
|
||||
if len(parts) != 2 {
|
||||
http.Error(w, "路径错误", http.StatusBadRequest)
|
||||
return
|
||||
}
|
||||
|
||||
accountName := parts[0]
|
||||
filename := parts[1]
|
||||
|
||||
// 构建完整文件路径
|
||||
filePath := filepath.Join("..", "data", accountName, filename)
|
||||
absPath, _ := filepath.Abs(filePath)
|
||||
|
||||
// 检查文件是否存在
|
||||
if _, err := os.Stat(absPath); os.IsNotExist(err) {
|
||||
http.Error(w, "文件不存在", http.StatusNotFound)
|
||||
return
|
||||
}
|
||||
|
||||
log.Printf("下载文件: %s", absPath)
|
||||
|
||||
// 设置响应头
|
||||
contentType := "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
|
||||
if strings.HasSuffix(filename, ".txt") {
|
||||
contentType = "text/plain; charset=utf-8"
|
||||
}
|
||||
w.Header().Set("Content-Type", contentType)
|
||||
w.Header().Set("Content-Disposition", fmt.Sprintf("attachment; filename*=UTF-8''%s", filename))
|
||||
|
||||
// 发送文件
|
||||
http.ServeFile(w, r, absPath)
|
||||
}
|
||||
|
||||
// 写入JSON响应
|
||||
func writeJSON(w http.ResponseWriter, data interface{}) {
|
||||
w.Header().Set("Content-Type", "application/json; charset=utf-8")
|
||||
json.NewEncoder(w).Encode(data)
|
||||
}
|
||||
23
backend/api/start_api.bat
Normal file
23
backend/api/start_api.bat
Normal file
@@ -0,0 +1,23 @@
|
||||
@echo off
|
||||
chcp 65001 >nul
|
||||
title 微信公众号文章爬虫 - API服务器
|
||||
|
||||
:: 检查api_server.exe是否存在
|
||||
if not exist "api_server.exe" (
|
||||
echo ===============================================
|
||||
echo ⚠️ API服务器未编译
|
||||
echo ===============================================
|
||||
echo.
|
||||
echo 正在编译 API 服务器...
|
||||
echo.
|
||||
call build.bat
|
||||
if %errorlevel% neq 0 (
|
||||
echo 编译失败,无法启动服务器
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
)
|
||||
|
||||
:: 启动API服务器
|
||||
cls
|
||||
api_server.exe
|
||||
Reference in New Issue
Block a user