Compare commits
2 Commits
51096cc21d
...
4fef65bd93
| Author | SHA1 | Date | |
|---|---|---|---|
| 4fef65bd93 | |||
| 46de43ce72 |
@@ -1,73 +0,0 @@
|
||||
# 微信公众号文章爬取工具(Go版本)
|
||||
|
||||
这是一个基于Go语言开发的微信公众号文章爬取工具,可以自动获取指定公众号的所有文章列表和详细内容。
|
||||
|
||||
## 功能特性
|
||||
|
||||
- 获取公众号所有文章列表
|
||||
- 获取每篇文章的详细内容
|
||||
- 获取文章的阅读量、点赞数、转发数等统计信息
|
||||
- 支持获取文章评论
|
||||
- 自动保存文章列表和详细内容
|
||||
|
||||
## 环境要求
|
||||
|
||||
- Go 1.20 或更高版本
|
||||
- Windows 操作系统(脚本已针对Windows优化)
|
||||
|
||||
## 安装使用
|
||||
|
||||
### 1. 配置Cookie
|
||||
|
||||
- 将 `cookie.txt.example` 重命名为 `cookie.txt`
|
||||
- 按照文件中的说明获取微信公众平台的Cookie
|
||||
- 将Cookie信息粘贴到 `cookie.txt` 文件中
|
||||
|
||||
### 2. 运行程序
|
||||
|
||||
直接双击 `run.bat` 脚本文件,程序会自动:
|
||||
- 下载所需依赖
|
||||
- 编译Go程序
|
||||
- 运行爬取工具
|
||||
|
||||
## 项目结构
|
||||
|
||||
```
|
||||
backend/
|
||||
├── cmd/
|
||||
│ └── main.go # 主程序入口
|
||||
├── configs/
|
||||
│ └── config.go # 配置管理
|
||||
├── pkg/
|
||||
│ ├── utils/ # 工具函数
|
||||
│ │ └── utils.go
|
||||
│ └── wechat/ # 微信相关功能实现
|
||||
│ └── access_articles.go
|
||||
├── data/ # 数据存储目录
|
||||
├── cookie.txt # Cookie文件(需要手动创建)
|
||||
├── go.mod # Go模块定义
|
||||
├── run.bat # Windows启动脚本
|
||||
└── README.md # 使用说明
|
||||
```
|
||||
|
||||
## 注意事项
|
||||
|
||||
1. 使用本工具前,请确保您已获得相关公众号的访问权限
|
||||
2. 请遵守相关法律法规,合理使用本工具
|
||||
3. 频繁请求可能会触发微信的反爬虫机制,请控制爬取频率
|
||||
4. 由于微信接口可能会变化,工具可能需要相应调整
|
||||
|
||||
## 常见问题
|
||||
|
||||
### Q: 获取Cookie失败怎么办?
|
||||
A: 请确保您已登录微信公众平台,并且在开发者工具中正确复制了完整的Cookie信息。
|
||||
|
||||
### Q: 爬取过程中出现网络错误怎么办?
|
||||
A: 工具会自动处理简单的网络错误,请确保网络连接正常。如果持续失败,可能是微信接口发生了变化。
|
||||
|
||||
### Q: 如何修改爬取的公众号?
|
||||
A: 工具会自动从Cookie中获取当前登录用户可访问的公众号信息。如果需要爬取不同的公众号,请在微信公众平台中切换账号后重新获取Cookie。
|
||||
|
||||
## 许可证
|
||||
|
||||
本项目仅供学习和研究使用。
|
||||
460
backend/api/API接口文档.md
Normal file
460
backend/api/API接口文档.md
Normal file
@@ -0,0 +1,460 @@
|
||||
# 📡 微信公众号文章爬虫 - API 接口文档
|
||||
|
||||
## 服务器信息
|
||||
|
||||
- **服务地址**: http://localhost:8080
|
||||
- **协议**: HTTP/1.1
|
||||
- **数据格式**: JSON
|
||||
- **字符编码**: UTF-8
|
||||
- **CORS**: 已启用(允许所有来源)
|
||||
|
||||
## 统一响应格式
|
||||
|
||||
所有API接口返回格式统一为:
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true, // 请求是否成功
|
||||
"message": "操作成功", // 提示信息
|
||||
"data": {} // 数据内容(可选)
|
||||
}
|
||||
```
|
||||
|
||||
## 接口列表
|
||||
|
||||
### 1. 提取公众号主页
|
||||
|
||||
**接口地址**: `/api/homepage/extract`
|
||||
**请求方法**: POST
|
||||
**功能说明**: 从文章链接中提取公众号主页链接
|
||||
|
||||
#### 请求参数
|
||||
|
||||
```json
|
||||
{
|
||||
"url": "https://mp.weixin.qq.com/s?__biz=xxx&mid=xxx"
|
||||
}
|
||||
```
|
||||
|
||||
| 参数 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| url | string | 是 | 公众号文章链接 |
|
||||
|
||||
#### 响应示例
|
||||
|
||||
**成功响应**:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "提取成功",
|
||||
"data": {
|
||||
"homepage": "https://mp.weixin.qq.com/mp/profile_ext?action=home&__biz=xxx&scene=124",
|
||||
"output": "完整的命令行输出信息"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**失败响应**:
|
||||
```json
|
||||
{
|
||||
"success": false,
|
||||
"message": "未能提取到主页链接"
|
||||
}
|
||||
```
|
||||
|
||||
#### 调用示例
|
||||
|
||||
**jQuery**:
|
||||
```javascript
|
||||
$.ajax({
|
||||
url: 'http://localhost:8080/api/homepage/extract',
|
||||
method: 'POST',
|
||||
contentType: 'application/json',
|
||||
data: JSON.stringify({
|
||||
url: 'https://mp.weixin.qq.com/s?__biz=xxx&mid=xxx'
|
||||
}),
|
||||
success: function(response) {
|
||||
if (response.success) {
|
||||
console.log('主页链接:', response.data.homepage);
|
||||
}
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**curl**:
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/api/homepage/extract \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"url":"https://mp.weixin.qq.com/s?__biz=xxx&mid=xxx"}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. 下载单篇文章
|
||||
|
||||
**接口地址**: `/api/article/download`
|
||||
**请求方法**: POST
|
||||
**功能说明**: 下载指定的单篇文章
|
||||
|
||||
#### 请求参数
|
||||
|
||||
```json
|
||||
{
|
||||
"url": "https://mp.weixin.qq.com/s?__biz=xxx",
|
||||
"save_image": true,
|
||||
"save_content": true
|
||||
}
|
||||
```
|
||||
|
||||
| 参数 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| url | string | 是 | 文章链接 |
|
||||
| save_image | boolean | 否 | 是否保存图片(默认false) |
|
||||
| save_content | boolean | 否 | 是否保存内容(默认true) |
|
||||
|
||||
#### 响应示例
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "下载任务已启动",
|
||||
"data": {
|
||||
"url": "https://mp.weixin.qq.com/s?__biz=xxx"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. 获取文章列表
|
||||
|
||||
**接口地址**: `/api/article/list`
|
||||
**请求方法**: POST
|
||||
**功能说明**: 批量获取公众号的文章列表
|
||||
|
||||
#### 请求参数
|
||||
|
||||
```json
|
||||
{
|
||||
"access_token": "https://mp.weixin.qq.com/mp/profile_ext?action=xxx&appmsg_token=xxx",
|
||||
"pages": 0
|
||||
}
|
||||
```
|
||||
|
||||
| 参数 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| access_token | string | 是 | 包含appmsg_token的URL |
|
||||
| pages | integer | 否 | 获取页数,0表示全部(默认0) |
|
||||
|
||||
#### 响应示例
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "任务已启动"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. 批量下载文章
|
||||
|
||||
**接口地址**: `/api/article/batch`
|
||||
**请求方法**: POST
|
||||
**功能说明**: 批量下载公众号的所有文章
|
||||
|
||||
#### 请求参数
|
||||
|
||||
```json
|
||||
{
|
||||
"official_account": "公众号名称或文章链接",
|
||||
"save_image": true,
|
||||
"save_content": true
|
||||
}
|
||||
```
|
||||
|
||||
| 参数 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| official_account | string | 是 | 公众号名称或任意文章链接 |
|
||||
| save_image | boolean | 否 | 是否保存图片(默认false) |
|
||||
| save_content | boolean | 否 | 是否保存内容(默认true) |
|
||||
|
||||
#### 响应示例
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "任务已启动"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 5. 获取数据列表
|
||||
|
||||
**接口地址**: `/api/data/list`
|
||||
**请求方法**: GET
|
||||
**功能说明**: 获取已下载的公众号数据列表
|
||||
|
||||
#### 请求参数
|
||||
|
||||
无
|
||||
|
||||
#### 响应示例
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"data": [
|
||||
{
|
||||
"name": "研招网资讯",
|
||||
"article_count": 125,
|
||||
"path": "D:\\workspace\\Access_wechat_article\\backend\\data\\研招网资讯",
|
||||
"last_update": "2025-11-27"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
| 字段 | 类型 | 说明 |
|
||||
|------|------|------|
|
||||
| name | string | 公众号名称 |
|
||||
| article_count | integer | 文章数量 |
|
||||
| path | string | 存储路径 |
|
||||
| last_update | string | 最后更新时间 |
|
||||
|
||||
#### 调用示例
|
||||
|
||||
**jQuery**:
|
||||
```javascript
|
||||
$.get('http://localhost:8080/api/data/list', function(response) {
|
||||
if (response.success) {
|
||||
console.log('数据列表:', response.data);
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**curl**:
|
||||
```bash
|
||||
curl http://localhost:8080/api/data/list
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 6. 获取任务状态
|
||||
|
||||
**接口地址**: `/api/task/status`
|
||||
**请求方法**: GET
|
||||
**功能说明**: 获取当前任务的执行状态
|
||||
|
||||
#### 请求参数
|
||||
|
||||
无
|
||||
|
||||
#### 响应示例
|
||||
|
||||
**任务运行中**:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"data": {
|
||||
"running": true,
|
||||
"progress": 45,
|
||||
"message": "正在下载第10篇文章..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**无任务运行**:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"data": {
|
||||
"running": false,
|
||||
"progress": 0,
|
||||
"message": ""
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| 字段 | 类型 | 说明 |
|
||||
|------|------|------|
|
||||
| running | boolean | 是否有任务运行中 |
|
||||
| progress | integer | 任务进度(0-100) |
|
||||
| message | string | 任务状态描述 |
|
||||
| error | string | 错误信息(可选) |
|
||||
|
||||
---
|
||||
|
||||
## 错误码说明
|
||||
|
||||
### HTTP状态码
|
||||
|
||||
| 状态码 | 说明 |
|
||||
|--------|------|
|
||||
| 200 | 请求成功 |
|
||||
| 400 | 请求参数错误 |
|
||||
| 500 | 服务器内部错误 |
|
||||
|
||||
### 业务错误码
|
||||
|
||||
所有业务错误通过响应中的 `success` 字段和 `message` 字段返回:
|
||||
|
||||
```json
|
||||
{
|
||||
"success": false,
|
||||
"message": "具体的错误信息"
|
||||
}
|
||||
```
|
||||
|
||||
常见错误信息:
|
||||
|
||||
| 错误信息 | 说明 | 解决方法 |
|
||||
|----------|------|----------|
|
||||
| 请求参数错误 | JSON格式不正确或缺少必填参数 | 检查请求参数格式 |
|
||||
| 执行失败 | 后端程序执行出错 | 查看详细错误信息 |
|
||||
| 未能提取到主页链接 | 文章链接格式错误或解析失败 | 使用有效的文章链接 |
|
||||
| 读取数据目录失败 | data目录不存在或无权限 | 检查目录权限 |
|
||||
|
||||
---
|
||||
|
||||
## 开发指南
|
||||
|
||||
### 本地测试
|
||||
|
||||
1. **启动API服务器**:
|
||||
```bash
|
||||
cd backend\api
|
||||
start_api.bat
|
||||
```
|
||||
|
||||
2. **测试接口**:
|
||||
```bash
|
||||
# 测试提取主页
|
||||
curl -X POST http://localhost:8080/api/homepage/extract \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{\"url\":\"文章链接\"}"
|
||||
|
||||
# 测试获取数据列表
|
||||
curl http://localhost:8080/api/data/list
|
||||
```
|
||||
|
||||
### 跨域配置
|
||||
|
||||
API服务器已启用CORS,允许所有来源访问:
|
||||
|
||||
```go
|
||||
w.Header().Set("Access-Control-Allow-Origin", "*")
|
||||
w.Header().Set("Access-Control-Allow-Methods", "GET, POST, OPTIONS")
|
||||
w.Header().Set("Access-Control-Allow-Headers", "Content-Type")
|
||||
```
|
||||
|
||||
如需限制特定域名,修改 `server.go` 中的 `corsMiddleware` 函数。
|
||||
|
||||
### 超时设置
|
||||
|
||||
默认HTTP超时时间:30秒
|
||||
|
||||
如需修改,在 `server.go` 中添加:
|
||||
|
||||
```go
|
||||
server := &http.Server{
|
||||
Addr: ":8080",
|
||||
ReadTimeout: 30 * time.Second,
|
||||
WriteTimeout: 30 * time.Second,
|
||||
}
|
||||
```
|
||||
|
||||
### 日志记录
|
||||
|
||||
API服务器使用标准输出记录日志:
|
||||
|
||||
```go
|
||||
log.Printf("[%s] %s - %s", r.Method, r.URL.Path, message)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 接口更新计划
|
||||
|
||||
### v1.1.0(计划中)
|
||||
- [ ] 添加用户认证机制
|
||||
- [ ] 支持任务队列管理
|
||||
- [ ] 增加下载进度推送(WebSocket)
|
||||
- [ ] 提供文章搜索接口
|
||||
|
||||
### v1.2.0(计划中)
|
||||
- [ ] 数据统计分析接口
|
||||
- [ ] 导出功能(PDF/Word)
|
||||
- [ ] 批量任务管理
|
||||
- [ ] 定时任务支持
|
||||
|
||||
---
|
||||
|
||||
## 技术栈
|
||||
|
||||
- **语言**: Go 1.20+
|
||||
- **Web框架**: net/http (标准库)
|
||||
- **数据格式**: JSON
|
||||
- **并发模型**: Goroutine
|
||||
|
||||
---
|
||||
|
||||
## 性能说明
|
||||
|
||||
### 并发能力
|
||||
- 支持多客户端同时访问
|
||||
- 但同一时间只能执行一个爬虫任务(`currentTask`)
|
||||
|
||||
### 资源占用
|
||||
- CPU: 低(主要I/O操作)
|
||||
- 内存: <50MB
|
||||
- 磁盘: 取决于下载的文章数量
|
||||
|
||||
### 性能优化建议
|
||||
1. 使用连接池管理HTTP请求
|
||||
2. 实现任务队列机制
|
||||
3. 添加结果缓存
|
||||
4. 启用gzip压缩
|
||||
|
||||
---
|
||||
|
||||
## 安全建议
|
||||
|
||||
### 1. 生产环境部署
|
||||
- 添加HTTPS支持
|
||||
- 实现API认证(JWT/OAuth)
|
||||
- 限制跨域来源
|
||||
- 添加请求频率限制
|
||||
|
||||
### 2. 数据安全
|
||||
- 不要暴露敏感信息(Cookie)
|
||||
- 定期清理临时文件
|
||||
- 备份重要数据
|
||||
|
||||
### 3. 访问控制
|
||||
- 添加IP白名单
|
||||
- 实现用户权限管理
|
||||
- 记录操作日志
|
||||
|
||||
---
|
||||
|
||||
## 常见问题
|
||||
|
||||
### Q1: 为什么任务启动后没有响应?
|
||||
A: 检查后端 `wechat-crawler.exe` 是否存在并有执行权限。
|
||||
|
||||
### Q2: 如何查看详细的错误信息?
|
||||
A: 查看API服务器窗口的控制台输出。
|
||||
|
||||
### Q3: 能同时执行多个下载任务吗?
|
||||
A: 当前版本不支持,同时只能执行一个任务。
|
||||
|
||||
### Q4: 如何停止正在运行的任务?
|
||||
A: 关闭API服务器窗口或重启服务器。
|
||||
|
||||
---
|
||||
|
||||
**文档版本**: v1.0.0
|
||||
**最后更新**: 2025-11-27
|
||||
**维护者**: AI Assistant
|
||||
26
backend/api/build.bat
Normal file
26
backend/api/build.bat
Normal file
@@ -0,0 +1,26 @@
|
||||
@echo off
|
||||
chcp 65001 >nul
|
||||
echo ===============================================
|
||||
echo 📦 编译 API 服务器
|
||||
echo ===============================================
|
||||
echo.
|
||||
|
||||
echo 🔨 正在编译 api_server.exe...
|
||||
go build -o api_server.exe server.go
|
||||
|
||||
if %errorlevel% neq 0 (
|
||||
echo.
|
||||
echo ❌ 编译失败!
|
||||
echo.
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
echo.
|
||||
echo ✅ 编译成功!
|
||||
echo 📁 输出文件: api_server.exe
|
||||
echo.
|
||||
echo ===============================================
|
||||
echo 编译完成
|
||||
echo ===============================================
|
||||
pause
|
||||
543
backend/api/server.go
Normal file
543
backend/api/server.go
Normal file
@@ -0,0 +1,543 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"log"
|
||||
"net/http"
|
||||
"os"
|
||||
"os/exec"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"time"
|
||||
)
|
||||
|
||||
// Response 统一响应结构
|
||||
type Response struct {
|
||||
Success bool `json:"success"`
|
||||
Message string `json:"message"`
|
||||
Data interface{} `json:"data,omitempty"`
|
||||
}
|
||||
|
||||
// 任务状态
|
||||
type TaskStatus struct {
|
||||
Running bool `json:"running"`
|
||||
Progress int `json:"progress"`
|
||||
Message string `json:"message"`
|
||||
Error string `json:"error,omitempty"`
|
||||
}
|
||||
|
||||
var currentTask = &TaskStatus{Running: false}
|
||||
|
||||
func main() {
|
||||
// 启用CORS
|
||||
http.HandleFunc("/", corsMiddleware(handleRoot))
|
||||
http.HandleFunc("/api/homepage/extract", corsMiddleware(extractHomepageHandler))
|
||||
http.HandleFunc("/api/article/download", corsMiddleware(downloadArticleHandler))
|
||||
http.HandleFunc("/api/article/list", corsMiddleware(getArticleListHandler))
|
||||
http.HandleFunc("/api/article/batch", corsMiddleware(batchDownloadHandler))
|
||||
http.HandleFunc("/api/data/list", corsMiddleware(getDataListHandler))
|
||||
http.HandleFunc("/api/task/status", corsMiddleware(getTaskStatusHandler))
|
||||
http.HandleFunc("/api/download/", corsMiddleware(downloadFileHandler))
|
||||
|
||||
port := ":8080"
|
||||
fmt.Println("===============================================")
|
||||
fmt.Println(" 🚀 微信公众号文章爬虫 API 服务器")
|
||||
fmt.Println("===============================================")
|
||||
fmt.Printf("🌐 服务地址: http://localhost%s\n", port)
|
||||
fmt.Printf("⏰ 启动时间: %s\n", time.Now().Format("2006-01-02 15:04:05"))
|
||||
fmt.Println("===============================================\n")
|
||||
|
||||
if err := http.ListenAndServe(port, nil); err != nil {
|
||||
log.Fatal("服务器启动失败:", err)
|
||||
}
|
||||
}
|
||||
|
||||
// CORS中间件
|
||||
func corsMiddleware(next http.HandlerFunc) http.HandlerFunc {
|
||||
return func(w http.ResponseWriter, r *http.Request) {
|
||||
w.Header().Set("Access-Control-Allow-Origin", "*")
|
||||
w.Header().Set("Access-Control-Allow-Methods", "GET, POST, OPTIONS")
|
||||
w.Header().Set("Access-Control-Allow-Headers", "Content-Type")
|
||||
|
||||
if r.Method == "OPTIONS" {
|
||||
w.WriteHeader(http.StatusOK)
|
||||
return
|
||||
}
|
||||
|
||||
next(w, r)
|
||||
}
|
||||
}
|
||||
|
||||
// 首页处理
|
||||
func handleRoot(w http.ResponseWriter, r *http.Request) {
|
||||
w.Header().Set("Content-Type", "text/html; charset=utf-8")
|
||||
html := `
|
||||
<!DOCTYPE html>
|
||||
<html>
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<title>微信公众号文章爬虫 API</title>
|
||||
<style>
|
||||
body { font-family: Arial, sans-serif; max-width: 800px; margin: 50px auto; padding: 20px; }
|
||||
h1 { color: #333; }
|
||||
.endpoint { background: #f5f5f5; padding: 10px; margin: 10px 0; border-radius: 5px; }
|
||||
.method { color: #4CAF50; font-weight: bold; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>🚀 微信公众号文章爬虫 API 服务器</h1>
|
||||
<p>当前时间: ` + time.Now().Format("2006-01-02 15:04:05") + `</p>
|
||||
<h2>可用接口:</h2>
|
||||
<div class="endpoint">
|
||||
<span class="method">POST</span> /api/homepage/extract - 提取公众号主页
|
||||
</div>
|
||||
<div class="endpoint">
|
||||
<span class="method">POST</span> /api/article/download - 下载单篇文章
|
||||
</div>
|
||||
<div class="endpoint">
|
||||
<span class="method">POST</span> /api/article/list - 获取文章列表
|
||||
</div>
|
||||
<div class="endpoint">
|
||||
<span class="method">POST</span> /api/article/batch - 批量下载文章
|
||||
</div>
|
||||
<div class="endpoint">
|
||||
<span class="method">GET</span> /api/data/list - 获取数据列表
|
||||
</div>
|
||||
<div class="endpoint">
|
||||
<span class="method">GET</span> /api/task/status - 获取任务状态
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
`
|
||||
w.Write([]byte(html))
|
||||
}
|
||||
|
||||
// 提取公众号主页
|
||||
func extractHomepageHandler(w http.ResponseWriter, r *http.Request) {
|
||||
var req struct {
|
||||
URL string `json:"url"`
|
||||
}
|
||||
|
||||
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
||||
writeJSON(w, Response{Success: false, Message: "请求参数错误"})
|
||||
return
|
||||
}
|
||||
|
||||
// 执行命令(使用绝对路径)
|
||||
exePath := filepath.Join("..", "wechat-crawler.exe")
|
||||
absPath, _ := filepath.Abs(exePath)
|
||||
log.Printf("尝试执行: %s", absPath)
|
||||
|
||||
cmd := exec.Command(absPath, req.URL)
|
||||
workDir, _ := filepath.Abs("..")
|
||||
cmd.Dir = workDir
|
||||
output, err := cmd.CombinedOutput()
|
||||
|
||||
if err != nil {
|
||||
log.Printf("执行失败: %v, 输出: %s", err, string(output))
|
||||
writeJSON(w, Response{Success: false, Message: "执行失败: " + string(output)})
|
||||
return
|
||||
}
|
||||
|
||||
// 从输出中提取公众号主页链接
|
||||
outputStr := string(output)
|
||||
lines := strings.Split(outputStr, "\n")
|
||||
var homepageURL string
|
||||
|
||||
for _, line := range lines {
|
||||
if strings.Contains(line, "公众号主页链接") || strings.Contains(line, "https://mp.weixin.qq.com/mp/profile_ext") {
|
||||
// 提取URL
|
||||
if idx := strings.Index(line, "https://"); idx != -1 {
|
||||
homepageURL = strings.TrimSpace(line[idx:])
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if homepageURL == "" {
|
||||
writeJSON(w, Response{Success: false, Message: "未能提取到主页链接"})
|
||||
return
|
||||
}
|
||||
|
||||
writeJSON(w, Response{
|
||||
Success: true,
|
||||
Message: "提取成功",
|
||||
Data: map[string]string{
|
||||
"homepage": homepageURL,
|
||||
"output": outputStr,
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
// 下载单篇文章(这里需要实现具体逻辑)
|
||||
func downloadArticleHandler(w http.ResponseWriter, r *http.Request) {
|
||||
var req struct {
|
||||
URL string `json:"url"`
|
||||
SaveImage bool `json:"save_image"`
|
||||
SaveContent bool `json:"save_content"`
|
||||
}
|
||||
|
||||
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
||||
writeJSON(w, Response{Success: false, Message: "请求参数错误"})
|
||||
return
|
||||
}
|
||||
|
||||
currentTask.Running = true
|
||||
currentTask.Progress = 0
|
||||
currentTask.Message = "正在下载文章..."
|
||||
|
||||
// 注意:这里需要实际调用爬虫的下载功能
|
||||
// 由于当前后端程序没有单独的下载单篇文章的命令行接口
|
||||
// 需要后续实现或使用其他方式
|
||||
|
||||
writeJSON(w, Response{
|
||||
Success: true,
|
||||
Message: "下载任务已启动",
|
||||
Data: map[string]interface{}{
|
||||
"url": req.URL,
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
// 获取文章列表
|
||||
func getArticleListHandler(w http.ResponseWriter, r *http.Request) {
|
||||
var req struct {
|
||||
AccessToken string `json:"access_token"`
|
||||
Pages int `json:"pages"`
|
||||
}
|
||||
|
||||
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
||||
writeJSON(w, Response{Success: false, Message: "请求参数错误"})
|
||||
return
|
||||
}
|
||||
|
||||
currentTask.Running = true
|
||||
currentTask.Progress = 0
|
||||
currentTask.Message = "正在获取文章列表..."
|
||||
|
||||
// 同步执行爬虫程序(功能3)
|
||||
exePath := filepath.Join("..", "wechat-crawler.exe")
|
||||
absPath, _ := filepath.Abs(exePath)
|
||||
workDir, _ := filepath.Abs("..")
|
||||
|
||||
log.Printf("启动功能3: %s, 工作目录: %s", absPath, workDir)
|
||||
cmd := exec.Command(absPath)
|
||||
cmd.Dir = workDir
|
||||
|
||||
// 创建输入管道
|
||||
stdin, err := cmd.StdinPipe()
|
||||
if err != nil {
|
||||
log.Printf("创建输入管道失败: %v", err)
|
||||
currentTask.Running = false
|
||||
writeJSON(w, Response{Success: false, Message: "创建输入管道失败: " + err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
// 启动命令
|
||||
if err := cmd.Start(); err != nil {
|
||||
log.Printf("启动命令失败: %v", err)
|
||||
currentTask.Running = false
|
||||
writeJSON(w, Response{Success: false, Message: "启动命令失败: " + err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
// 发送选项"3"(功能3:通过access_token获取文章列表)
|
||||
fmt.Fprintln(stdin, "3")
|
||||
fmt.Fprintln(stdin, req.AccessToken)
|
||||
if req.Pages > 0 {
|
||||
fmt.Fprintf(stdin, "%d\n", req.Pages)
|
||||
} else {
|
||||
fmt.Fprintln(stdin, "0")
|
||||
}
|
||||
stdin.Close()
|
||||
|
||||
// 等待命令完成
|
||||
if err := cmd.Wait(); err != nil {
|
||||
log.Printf("命令执行失败: %v", err)
|
||||
currentTask.Running = false
|
||||
writeJSON(w, Response{Success: false, Message: "命令执行失败: " + err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
currentTask.Running = false
|
||||
currentTask.Progress = 100
|
||||
currentTask.Message = "文章列表获取完成"
|
||||
|
||||
// 查找生成的文件并返回下载链接
|
||||
dataDir := "../data"
|
||||
entries, err := os.ReadDir(dataDir)
|
||||
if err != nil {
|
||||
writeJSON(w, Response{Success: false, Message: "读取数据目录失败: " + err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
// 查找最新创建的公众号目录
|
||||
var latestDir string
|
||||
var latestTime time.Time
|
||||
for _, entry := range entries {
|
||||
if entry.IsDir() && entry.Name() != "." && entry.Name() != ".." {
|
||||
info, _ := entry.Info()
|
||||
if info.ModTime().After(latestTime) {
|
||||
latestTime = info.ModTime()
|
||||
latestDir = entry.Name()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if latestDir == "" {
|
||||
writeJSON(w, Response{Success: false, Message: "未找到生成的数据目录"})
|
||||
return
|
||||
}
|
||||
|
||||
log.Printf("找到最新目录: %s", latestDir)
|
||||
|
||||
// 查找文章列表文件(优先查找直连链接文件)
|
||||
accountPath := filepath.Join(dataDir, latestDir)
|
||||
files, err := os.ReadDir(accountPath)
|
||||
if err != nil {
|
||||
writeJSON(w, Response{Success: false, Message: "读取公众号目录失败: " + err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
var excelFile string
|
||||
// 优先查找直连链接文件(.xlsx或.txt)
|
||||
for _, file := range files {
|
||||
if !file.IsDir() && strings.Contains(file.Name(), "直连链接") {
|
||||
if strings.HasSuffix(file.Name(), ".xlsx") || strings.HasSuffix(file.Name(), ".txt") {
|
||||
excelFile = file.Name()
|
||||
log.Printf("找到直连链接文件: %s", excelFile)
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// 如果没有直连链接文件,查找原始链接文件
|
||||
if excelFile == "" {
|
||||
for _, file := range files {
|
||||
if !file.IsDir() && strings.Contains(file.Name(), "原始链接") {
|
||||
if strings.HasSuffix(file.Name(), ".xlsx") || strings.HasSuffix(file.Name(), ".txt") {
|
||||
excelFile = file.Name()
|
||||
log.Printf("找到原始链接文件: %s", excelFile)
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// 如果还是没有,查找任何文章列表文件
|
||||
if excelFile == "" {
|
||||
for _, file := range files {
|
||||
if !file.IsDir() && strings.Contains(file.Name(), "文章列表") {
|
||||
if strings.HasSuffix(file.Name(), ".xlsx") || strings.HasSuffix(file.Name(), ".txt") {
|
||||
excelFile = file.Name()
|
||||
log.Printf("找到文章列表文件: %s", excelFile)
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if excelFile == "" {
|
||||
// 列出所有文件用于调试
|
||||
var fileList []string
|
||||
for _, file := range files {
|
||||
fileList = append(fileList, file.Name())
|
||||
}
|
||||
log.Printf("目录 %s 中的文件: %v", latestDir, fileList)
|
||||
writeJSON(w, Response{Success: false, Message: "未找到Excel文件,目录中的文件: " + strings.Join(fileList, ", ")})
|
||||
return
|
||||
}
|
||||
|
||||
writeJSON(w, Response{
|
||||
Success: true,
|
||||
Message: "文章列表获取成功",
|
||||
Data: map[string]interface{}{
|
||||
"account": latestDir,
|
||||
"filename": excelFile,
|
||||
"download": fmt.Sprintf("/download/%s/%s", latestDir, excelFile),
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
// 批量下载文章
|
||||
func batchDownloadHandler(w http.ResponseWriter, r *http.Request) {
|
||||
var req struct {
|
||||
OfficialAccount string `json:"official_account"`
|
||||
SaveImage bool `json:"save_image"`
|
||||
SaveContent bool `json:"save_content"`
|
||||
}
|
||||
|
||||
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
||||
writeJSON(w, Response{Success: false, Message: "请求参数错误"})
|
||||
return
|
||||
}
|
||||
|
||||
currentTask.Running = true
|
||||
currentTask.Progress = 0
|
||||
currentTask.Message = "正在批量下载文章..."
|
||||
|
||||
// 同步执行爬虫程序(功能5)
|
||||
exePath := filepath.Join("..", "wechat-crawler.exe")
|
||||
absPath, _ := filepath.Abs(exePath)
|
||||
workDir, _ := filepath.Abs("..")
|
||||
|
||||
log.Printf("启动功能5: %s, 工作目录: %s", absPath, workDir)
|
||||
cmd := exec.Command(absPath)
|
||||
cmd.Dir = workDir
|
||||
|
||||
// 创建输入管道
|
||||
stdin, err := cmd.StdinPipe()
|
||||
if err != nil {
|
||||
log.Printf("创建输入管道失败: %v", err)
|
||||
currentTask.Running = false
|
||||
writeJSON(w, Response{Success: false, Message: "创建输入管道失败: " + err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
// 启动命令
|
||||
if err := cmd.Start(); err != nil {
|
||||
log.Printf("启动命令失败: %v", err)
|
||||
currentTask.Running = false
|
||||
writeJSON(w, Response{Success: false, Message: "启动命令失败: " + err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
// 发送选项"5"(功能5:批量下载)
|
||||
fmt.Fprintln(stdin, "5")
|
||||
fmt.Fprintln(stdin, req.OfficialAccount)
|
||||
|
||||
// 是否保存图片
|
||||
if req.SaveImage {
|
||||
fmt.Fprintln(stdin, "y")
|
||||
} else {
|
||||
fmt.Fprintln(stdin, "n")
|
||||
}
|
||||
stdin.Close()
|
||||
|
||||
// 等待命令完成
|
||||
if err := cmd.Wait(); err != nil {
|
||||
log.Printf("命令执行失败: %v", err)
|
||||
currentTask.Running = false
|
||||
writeJSON(w, Response{Success: false, Message: "命令执行失败: " + err.Error()})
|
||||
return
|
||||
}
|
||||
|
||||
currentTask.Running = false
|
||||
currentTask.Progress = 100
|
||||
currentTask.Message = "批量下载完成"
|
||||
|
||||
// 统计下载的文章数量
|
||||
accountPath := filepath.Join("../data", req.OfficialAccount, "文章详细")
|
||||
var articleCount int
|
||||
if entries, err := os.ReadDir(accountPath); err == nil {
|
||||
articleCount = len(entries)
|
||||
}
|
||||
|
||||
writeJSON(w, Response{
|
||||
Success: true,
|
||||
Message: fmt.Sprintf("批量下载完成,共下载 %d 篇文章", articleCount),
|
||||
Data: map[string]interface{}{
|
||||
"account": req.OfficialAccount,
|
||||
"articleCount": articleCount,
|
||||
"path": accountPath,
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
// 获取数据列表
|
||||
func getDataListHandler(w http.ResponseWriter, r *http.Request) {
|
||||
dataDir := "../data"
|
||||
var accounts []map[string]interface{}
|
||||
|
||||
entries, err := os.ReadDir(dataDir)
|
||||
if err != nil {
|
||||
// 如果目录不存在,返回空列表而不是错误
|
||||
writeJSON(w, Response{
|
||||
Success: true,
|
||||
Data: accounts,
|
||||
})
|
||||
return
|
||||
}
|
||||
|
||||
for _, entry := range entries {
|
||||
if entry.IsDir() {
|
||||
accountPath := filepath.Join(dataDir, entry.Name())
|
||||
|
||||
// 统计文章数量
|
||||
detailPath := filepath.Join(accountPath, "文章详细")
|
||||
var articleCount int
|
||||
if detailEntries, err := os.ReadDir(detailPath); err == nil {
|
||||
articleCount = len(detailEntries)
|
||||
}
|
||||
|
||||
// 获取最后更新时间
|
||||
info, _ := entry.Info()
|
||||
lastUpdate := info.ModTime().Format("2006-01-02")
|
||||
|
||||
accounts = append(accounts, map[string]interface{}{
|
||||
"name": entry.Name(),
|
||||
"articleCount": articleCount,
|
||||
"path": accountPath,
|
||||
"lastUpdate": lastUpdate,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
writeJSON(w, Response{
|
||||
Success: true,
|
||||
Data: accounts,
|
||||
})
|
||||
}
|
||||
|
||||
// 获取任务状态
|
||||
func getTaskStatusHandler(w http.ResponseWriter, r *http.Request) {
|
||||
writeJSON(w, Response{
|
||||
Success: true,
|
||||
Data: currentTask,
|
||||
})
|
||||
}
|
||||
|
||||
// 下载文件处理
|
||||
func downloadFileHandler(w http.ResponseWriter, r *http.Request) {
|
||||
// 从 URL 中提取路径 /api/download/公众号名称/文件名
|
||||
path := strings.TrimPrefix(r.URL.Path, "/api/download/")
|
||||
parts := strings.SplitN(path, "/", 2)
|
||||
|
||||
if len(parts) != 2 {
|
||||
http.Error(w, "路径错误", http.StatusBadRequest)
|
||||
return
|
||||
}
|
||||
|
||||
accountName := parts[0]
|
||||
filename := parts[1]
|
||||
|
||||
// 构建完整文件路径
|
||||
filePath := filepath.Join("..", "data", accountName, filename)
|
||||
absPath, _ := filepath.Abs(filePath)
|
||||
|
||||
// 检查文件是否存在
|
||||
if _, err := os.Stat(absPath); os.IsNotExist(err) {
|
||||
http.Error(w, "文件不存在", http.StatusNotFound)
|
||||
return
|
||||
}
|
||||
|
||||
log.Printf("下载文件: %s", absPath)
|
||||
|
||||
// 设置响应头
|
||||
contentType := "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
|
||||
if strings.HasSuffix(filename, ".txt") {
|
||||
contentType = "text/plain; charset=utf-8"
|
||||
}
|
||||
w.Header().Set("Content-Type", contentType)
|
||||
w.Header().Set("Content-Disposition", fmt.Sprintf("attachment; filename*=UTF-8''%s", filename))
|
||||
|
||||
// 发送文件
|
||||
http.ServeFile(w, r, absPath)
|
||||
}
|
||||
|
||||
// 写入JSON响应
|
||||
func writeJSON(w http.ResponseWriter, data interface{}) {
|
||||
w.Header().Set("Content-Type", "application/json; charset=utf-8")
|
||||
json.NewEncoder(w).Encode(data)
|
||||
}
|
||||
23
backend/api/start_api.bat
Normal file
23
backend/api/start_api.bat
Normal file
@@ -0,0 +1,23 @@
|
||||
@echo off
|
||||
chcp 65001 >nul
|
||||
title 微信公众号文章爬虫 - API服务器
|
||||
|
||||
:: 检查api_server.exe是否存在
|
||||
if not exist "api_server.exe" (
|
||||
echo ===============================================
|
||||
echo ⚠️ API服务器未编译
|
||||
echo ===============================================
|
||||
echo.
|
||||
echo 正在编译 API 服务器...
|
||||
echo.
|
||||
call build.bat
|
||||
if %errorlevel% neq 0 (
|
||||
echo 编译失败,无法启动服务器
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
)
|
||||
|
||||
:: 启动API服务器
|
||||
cls
|
||||
api_server.exe
|
||||
11
backend/cmd/data/研招网资讯/文章列表(article_list)_直连链接.txt
Normal file
11
backend/cmd/data/研招网资讯/文章列表(article_list)_直连链接.txt
Normal file
@@ -0,0 +1,11 @@
|
||||
序号,创建时间,标题,链接
|
||||
1,0,专家分析2026年考研报名人数,http://mp.weixin.qq.com/s?__biz=MzI3NzQzODQ5OA==&mid=2247500657&idx=1&sn=81eae7df4bfa2fdfc8bca69389489c52&chksm=ea981e22044aca6bbe5633849bfcd4903cb6f491646cd2ccf9321f4d9c852c64fe036f033c14&scene=27#wechat_redirect
|
||||
2,0,教育部:2026年全国硕士研究生报名人数为343万,http://mp.weixin.qq.com/s?__biz=MzI3NzQzODQ5OA==&mid=2247500650&idx=1&sn=9f230bbfefb24d98c18e42bd3651ad53&chksm=eac72972d56ff9b66f3658f0c3b1e6e363e56ddf879d56aba9c9c8f587b53ef00bcabe7992ff&scene=27#wechat_redirect
|
||||
3,0,【小研来了】“务必再坚持坚持”,http://mp.weixin.qq.com/s?__biz=MzI3NzQzODQ5OA==&mid=2247500645&idx=1&sn=8e1d5921861dc4e3647f7bf8adaada81&chksm=ea26b17ce2f7255aacd9d1d6358c9aeb8d4e043c692efb8b4d8183cfc8363b3068be79d585c2&scene=27#wechat_redirect
|
||||
4,0,学累了不?点进来看看这4个“续航”方法,http://mp.weixin.qq.com/s?__biz=MzI3NzQzODQ5OA==&mid=2247500631&idx=1&sn=b640b0e43378e368166e50a7f46735f2&chksm=ea71f10a83b7811e1896cd9704eac5d064b763f3e020b5b37c72727c55bb1b0862a92e9c4cf0&scene=27#wechat_redirect
|
||||
5,0,教育部:在“双一流”建设高校开展科技教育硕士培养,http://mp.weixin.qq.com/s?__biz=MzI3NzQzODQ5OA==&mid=2247500589&idx=1&sn=539d1229c9475ba5a2371698a362e9a7&chksm=ea4f97d3831139a276e50050f2f3307868b9c6ec7eb115bb9e288312f08572c47128a8016dce&scene=27#wechat_redirect
|
||||
6,0,“研味儿”正浓,冲刺在即!请你一定别放弃,http://mp.weixin.qq.com/s?__biz=MzI3NzQzODQ5OA==&mid=2247500584&idx=1&sn=294b6ba8d12f0948913abf04af8cb188&chksm=ea4cfb5b16684bdd12634b6e46d8d8f3ab72ca9108be0d4d7f83dfded09c6ecb9f31b1531e31&scene=27#wechat_redirect
|
||||
7,0,4个思维升级,让我找回了读研的掌控感,http://mp.weixin.qq.com/s?__biz=MzI3NzQzODQ5OA==&mid=2247500579&idx=1&sn=fa00084c8711e3009ff7e31fe0b3bc51&chksm=eaff1ec212ddbb738d20542a965bbd1b79ae3a9d2e5af5704ddcf41de3a8b8d658e562771f0c&scene=27#wechat_redirect
|
||||
8,0,研考网上确认成功后,需重点关注四件事,http://mp.weixin.qq.com/s?__biz=MzI3NzQzODQ5OA==&mid=2247500569&idx=1&sn=7707b698932ff6847de39d7351d3ac98&chksm=ea402eec6a96125a5bb02600aff24c3c1211eb5aaf5347080bbfe1f5861e9ca97fe9c400df21&scene=27#wechat_redirect
|
||||
9,0,,
|
||||
10,0,【小研来了】“小研,没有准考证照片怎么办?”,http://mp.weixin.qq.com/s?__biz=MzI3NzQzODQ5OA==&mid=2247500553&idx=1&sn=4fc6fd69684f02222e72d457c1004a81&chksm=eafc91ea346080790f9b641495fc3d9e31302ee5c2c9957eb4fa2bc9a139eda78163899b9219&scene=27#wechat_redirect
|
||||
@@ -4,6 +4,7 @@ import (
|
||||
"fmt"
|
||||
"io/ioutil"
|
||||
"log"
|
||||
"net/url"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
@@ -600,21 +601,48 @@ func parseAccessTokenParams(accessToken string) (string, string, string, string,
|
||||
if err != nil {
|
||||
return "", "", "", "", fmt.Errorf("未找到__biz参数")
|
||||
}
|
||||
// URL解码biz参数
|
||||
biz, err = url.QueryUnescape(biz)
|
||||
if err != nil {
|
||||
fmt.Printf("警告: URL解码__biz失败: %v,使用原始值\n", err)
|
||||
}
|
||||
|
||||
uin, err := utils.ExtractFromRegex(accessToken, "uin=([^&]*)")
|
||||
if err != nil {
|
||||
return "", "", "", "", fmt.Errorf("未找到uin参数")
|
||||
}
|
||||
// URL解码uin参数
|
||||
uin, err = url.QueryUnescape(uin)
|
||||
if err != nil {
|
||||
fmt.Printf("警告: URL解码uin失败: %v,使用原始值\n", err)
|
||||
}
|
||||
|
||||
key, err := utils.ExtractFromRegex(accessToken, "key=([^&]*)")
|
||||
if err != nil {
|
||||
return "", "", "", "", fmt.Errorf("未找到key参数")
|
||||
}
|
||||
// URL解码key参数
|
||||
key, err = url.QueryUnescape(key)
|
||||
if err != nil {
|
||||
fmt.Printf("警告: URL解码key失败: %v,使用原始值\n", err)
|
||||
}
|
||||
|
||||
passTicket, err := utils.ExtractFromRegex(accessToken, "pass_ticket=([^&]*)")
|
||||
if err != nil {
|
||||
return "", "", "", "", fmt.Errorf("未找到pass_ticket参数")
|
||||
}
|
||||
// URL解码pass_ticket参数
|
||||
passTicket, err = url.QueryUnescape(passTicket)
|
||||
if err != nil {
|
||||
fmt.Printf("警告: URL解码pass_ticket失败: %v,使用原始值\n", err)
|
||||
}
|
||||
|
||||
// 打印解码后的参数用于调试
|
||||
fmt.Printf("\n提取到的参数(已解码):\n")
|
||||
fmt.Printf(" __biz: %s\n", biz)
|
||||
fmt.Printf(" uin: %s\n", uin)
|
||||
fmt.Printf(" key长度: %d 字符\n", len(key))
|
||||
fmt.Printf(" pass_ticket长度: %d 字符\n", len(passTicket))
|
||||
|
||||
return biz, uin, key, passTicket, nil
|
||||
}
|
||||
|
||||
@@ -1,27 +0,0 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"os"
|
||||
|
||||
"github.com/wechat-crawler/pkg/wechat"
|
||||
)
|
||||
|
||||
func main() {
|
||||
fmt.Println("开始测试文章内容提取功能...")
|
||||
|
||||
// 创建一个简单的爬虫实例
|
||||
crawler := wechat.NewSimpleCrawler()
|
||||
|
||||
// 设置公众号名称(根据实际情况修改)
|
||||
officialAccountName := "验证"
|
||||
|
||||
// 调用GetListArticleFromFile函数测试
|
||||
err := crawler.GetListArticleFromFile(officialAccountName, false, true)
|
||||
if err != nil {
|
||||
fmt.Printf("测试失败: %v\n", err)
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
fmt.Println("测试完成!请检查文章内容是否已正确提取。")
|
||||
}
|
||||
@@ -1 +0,0 @@
|
||||
__biz=MzUxMjA4MTI0MjA1; uin=MTIzNDU2Nzg5; key=abcdef1234567890abcdef1234567890; pass_ticket=abcdefghijklmnopqrstuvwxyz1234567890; version=63090b13; wxtype=1; pass_ticket=abcdefghijklmnopqrstuvwxyz1234567890;
|
||||
@@ -1,12 +0,0 @@
|
||||
请将此文件重命名为cookie.txt,并填入微信公众平台的cookie信息
|
||||
|
||||
如何获取cookie:
|
||||
1. 打开浏览器,登录微信公众平台
|
||||
2. 按F12打开开发者工具
|
||||
3. 切换到Network标签
|
||||
4. 刷新页面或访问任意页面
|
||||
5. 选择一个请求,查看Headers中的Cookie
|
||||
6. 复制完整的Cookie到本文件中
|
||||
|
||||
Cookie格式示例:
|
||||
__biz=MzUxMjA4MTI0MjA1; uin=MTIzNDU2Nzg5; key=abcdef1234567890abcdef1234567890; pass_ticket=abcdefghijklmnopqrstuvwxyz1234567890; version=63090b13; wxtype=1; pass_ticket=abcdefghijklmnopqrstuvwxyz1234567890;
|
||||
36624
backend/debug_article_raw.html
Normal file
36624
backend/debug_article_raw.html
Normal file
File diff suppressed because one or more lines are too long
246
backend/examples/database_example.go
Normal file
246
backend/examples/database_example.go
Normal file
@@ -0,0 +1,246 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"log"
|
||||
|
||||
"github.com/wechat-crawler/pkg/database"
|
||||
)
|
||||
|
||||
func main() {
|
||||
fmt.Println("==============================================")
|
||||
fmt.Println(" 微信公众号文章数据库管理系统示例")
|
||||
fmt.Println("==============================================\n")
|
||||
|
||||
// 1. 初始化数据库
|
||||
db, err := database.InitDB("../data/wechat_articles.db")
|
||||
if err != nil {
|
||||
log.Fatal("数据库初始化失败:", err)
|
||||
}
|
||||
defer db.Close()
|
||||
|
||||
// 2. 创建仓库实例
|
||||
officialRepo := database.NewOfficialAccountRepository(db)
|
||||
articleRepo := database.NewArticleRepository(db)
|
||||
contentRepo := database.NewArticleContentRepository(db)
|
||||
|
||||
// 3. 示例:添加公众号
|
||||
fmt.Println("📝 示例1: 添加公众号信息")
|
||||
official := &database.OfficialAccount{
|
||||
Biz: "MzI1NjEwMTM4OA==",
|
||||
Nickname: "研招网资讯",
|
||||
Homepage: "https://mp.weixin.qq.com/mp/profile_ext?action=home&__biz=MzI1NjEwMTM4OA==&scene=124",
|
||||
Description: "中国研究生招生信息网官方公众号",
|
||||
}
|
||||
|
||||
// 检查是否已存在
|
||||
existing, err := officialRepo.GetByBiz(official.Biz)
|
||||
if err != nil {
|
||||
log.Fatal("查询公众号失败:", err)
|
||||
}
|
||||
|
||||
var officialID int64
|
||||
if existing == nil {
|
||||
// 不存在,创建新记录
|
||||
officialID, err = officialRepo.Create(official)
|
||||
if err != nil {
|
||||
log.Fatal("创建公众号失败:", err)
|
||||
}
|
||||
fmt.Printf("✅ 成功创建公众号: %s (ID: %d)\n\n", official.Nickname, officialID)
|
||||
} else {
|
||||
// 已存在
|
||||
officialID = existing.ID
|
||||
fmt.Printf("ℹ️ 公众号已存在: %s (ID: %d)\n\n", existing.Nickname, officialID)
|
||||
}
|
||||
|
||||
// 4. 示例:添加文章
|
||||
fmt.Println("📝 示例2: 添加文章信息")
|
||||
article := &database.Article{
|
||||
OfficialID: officialID,
|
||||
Title: "专家分析2026年考研报名人数",
|
||||
Author: "研招网资讯",
|
||||
Link: "https://mp.weixin.qq.com/s?__biz=MzI1NjEwMTM4OA==&mid=2651232405&idx=1",
|
||||
PublishTime: "2024-11-27 10:00:00",
|
||||
CreateTime: "2024-11-27 15:30:00",
|
||||
CommentID: "2247491372",
|
||||
ReadNum: 15234,
|
||||
LikeNum: 456,
|
||||
ShareNum: 123,
|
||||
ContentPreview: "根据最新统计数据显示,2026年全国硕士研究生报名人数预计将达到新高...",
|
||||
ParagraphCount: 15,
|
||||
}
|
||||
|
||||
// 检查文章是否已存在
|
||||
existingArticle, err := articleRepo.GetByLink(article.Link)
|
||||
if err != nil {
|
||||
log.Fatal("查询文章失败:", err)
|
||||
}
|
||||
|
||||
var articleID int64
|
||||
if existingArticle == nil {
|
||||
articleID, err = articleRepo.Create(article)
|
||||
if err != nil {
|
||||
log.Fatal("创建文章失败:", err)
|
||||
}
|
||||
fmt.Printf("✅ 成功创建文章: %s (ID: %d)\n\n", article.Title, articleID)
|
||||
} else {
|
||||
articleID = existingArticle.ID
|
||||
fmt.Printf("ℹ️ 文章已存在: %s (ID: %d)\n\n", existingArticle.Title, articleID)
|
||||
}
|
||||
|
||||
// 5. 示例:添加文章内容
|
||||
fmt.Println("📝 示例3: 添加文章详细内容")
|
||||
|
||||
paragraphs := []string{
|
||||
"根据最新统计数据显示,2026年全国硕士研究生报名人数预计将达到新高。",
|
||||
"教育部相关负责人表示,随着社会对高层次人才需求的增加,考研热度持续上升。",
|
||||
"专家建议考生理性选择,注重提升自身综合素质。",
|
||||
}
|
||||
|
||||
images := []string{
|
||||
"https://mmbiz.qpic.cn/mmbiz_jpg/xxx1.jpg",
|
||||
"https://mmbiz.qpic.cn/mmbiz_jpg/xxx2.jpg",
|
||||
}
|
||||
|
||||
content := &database.ArticleContent{
|
||||
ArticleID: articleID,
|
||||
HtmlContent: "<div>文章HTML内容</div>",
|
||||
TextContent: "文章纯文本内容...",
|
||||
Paragraphs: database.StringsToJSON(paragraphs),
|
||||
Images: database.StringsToJSON(images),
|
||||
}
|
||||
|
||||
// 检查内容是否已存在
|
||||
existingContent, err := contentRepo.GetByArticleID(articleID)
|
||||
if err != nil {
|
||||
log.Fatal("查询文章内容失败:", err)
|
||||
}
|
||||
|
||||
if existingContent == nil {
|
||||
contentID, err := contentRepo.Create(content)
|
||||
if err != nil {
|
||||
log.Fatal("创建文章内容失败:", err)
|
||||
}
|
||||
fmt.Printf("✅ 成功添加文章内容 (ID: %d)\n\n", contentID)
|
||||
} else {
|
||||
fmt.Printf("ℹ️ 文章内容已存在 (ID: %d)\n\n", existingContent.ID)
|
||||
}
|
||||
|
||||
// 6. 示例:查询文章列表
|
||||
fmt.Println("📋 示例4: 查询文章列表")
|
||||
articles, total, err := articleRepo.List(officialID, 1, 10)
|
||||
if err != nil {
|
||||
log.Fatal("查询文章列表失败:", err)
|
||||
}
|
||||
|
||||
fmt.Printf("共找到 %d 篇文章:\n", total)
|
||||
for i, item := range articles {
|
||||
fmt.Printf("%d. %s (👁️ %d | 👍 %d)\n", i+1, item.Title, item.ReadNum, item.LikeNum)
|
||||
}
|
||||
fmt.Println()
|
||||
|
||||
// 7. 示例:获取文章详情
|
||||
fmt.Println("📖 示例5: 获取文章详情")
|
||||
detail, err := contentRepo.GetArticleDetail(articleID)
|
||||
if err != nil {
|
||||
log.Fatal("获取文章详情失败:", err)
|
||||
}
|
||||
|
||||
if detail != nil {
|
||||
fmt.Printf("标题: %s\n", detail.Title)
|
||||
fmt.Printf("作者: %s\n", detail.Author)
|
||||
fmt.Printf("公众号: %s\n", detail.OfficialName)
|
||||
fmt.Printf("发布时间: %s\n", detail.PublishTime)
|
||||
fmt.Printf("阅读数: %d | 点赞数: %d\n", detail.ReadNum, detail.LikeNum)
|
||||
fmt.Printf("段落数: %d\n", len(detail.Paragraphs))
|
||||
fmt.Printf("图片数: %d\n", len(detail.Images))
|
||||
if len(detail.Paragraphs) > 0 {
|
||||
fmt.Printf("第一段: %s\n", detail.Paragraphs[0])
|
||||
}
|
||||
}
|
||||
fmt.Println()
|
||||
|
||||
// 8. 示例:搜索文章
|
||||
fmt.Println("🔍 示例6: 搜索文章")
|
||||
searchResults, searchTotal, err := articleRepo.Search("考研", 1, 10)
|
||||
if err != nil {
|
||||
log.Fatal("搜索文章失败:", err)
|
||||
}
|
||||
|
||||
fmt.Printf("搜索\"考研\"找到 %d 篇文章:\n", searchTotal)
|
||||
for i, item := range searchResults {
|
||||
fmt.Printf("%d. %s\n", i+1, item.Title)
|
||||
}
|
||||
fmt.Println()
|
||||
|
||||
// 9. 示例:获取统计信息
|
||||
fmt.Println("📊 示例7: 获取统计信息")
|
||||
stats, err := db.GetStatistics()
|
||||
if err != nil {
|
||||
log.Fatal("获取统计信息失败:", err)
|
||||
}
|
||||
|
||||
fmt.Printf("公众号总数: %d\n", stats.TotalOfficials)
|
||||
fmt.Printf("文章总数: %d\n", stats.TotalArticles)
|
||||
fmt.Printf("总阅读数: %d\n", stats.TotalReadNum)
|
||||
fmt.Printf("总点赞数: %d\n", stats.TotalLikeNum)
|
||||
fmt.Println()
|
||||
|
||||
// 10. 示例:批量插入文章
|
||||
fmt.Println("📦 示例8: 批量插入文章")
|
||||
batchArticles := []*database.Article{
|
||||
{
|
||||
OfficialID: officialID,
|
||||
Title: "教育部:2026年全国硕士研究生报名人数为343万",
|
||||
Author: "研招网资讯",
|
||||
Link: "https://mp.weixin.qq.com/s?__biz=MzI1NjEwMTM4OA==&mid=2651232406",
|
||||
PublishTime: "2024-11-26 09:00:00",
|
||||
ReadNum: 8965,
|
||||
LikeNum: 234,
|
||||
ContentPreview: "教育部公布2026年研究生招生数据...",
|
||||
ParagraphCount: 12,
|
||||
},
|
||||
{
|
||||
OfficialID: officialID,
|
||||
Title: "研考网上确认成功后,需重点关注四件事",
|
||||
Author: "研招网资讯",
|
||||
Link: "https://mp.weixin.qq.com/s?__biz=MzI1NjEwMTM4OA==&mid=2651232407",
|
||||
PublishTime: "2024-11-25 15:30:00",
|
||||
ReadNum: 6543,
|
||||
LikeNum: 189,
|
||||
ContentPreview: "网上确认通过后,考生还需要注意以下事项...",
|
||||
ParagraphCount: 8,
|
||||
},
|
||||
}
|
||||
|
||||
err = articleRepo.BatchInsertArticles(batchArticles)
|
||||
if err != nil {
|
||||
log.Fatal("批量插入文章失败:", err)
|
||||
}
|
||||
fmt.Printf("✅ 成功批量插入 %d 篇文章\n\n", len(batchArticles))
|
||||
|
||||
// 11. 示例:导出JSON数据
|
||||
fmt.Println("💾 示例9: 导出文章列表为JSON")
|
||||
allArticles, _, err := articleRepo.List(0, 1, 100)
|
||||
if err != nil {
|
||||
log.Fatal("查询文章列表失败:", err)
|
||||
}
|
||||
|
||||
jsonData, err := json.MarshalIndent(allArticles, "", " ")
|
||||
if err != nil {
|
||||
log.Fatal("JSON序列化失败:", err)
|
||||
}
|
||||
|
||||
fmt.Println("文章列表JSON (前200字符):")
|
||||
if len(jsonData) > 200 {
|
||||
fmt.Println(string(jsonData[:200]) + "...")
|
||||
} else {
|
||||
fmt.Println(string(jsonData))
|
||||
}
|
||||
fmt.Println()
|
||||
|
||||
fmt.Println("==============================================")
|
||||
fmt.Println(" 数据库操作示例演示完成!")
|
||||
fmt.Println("==============================================")
|
||||
}
|
||||
@@ -1,7 +0,0 @@
|
||||
module github.com/wechat-crawler
|
||||
|
||||
go 1.20
|
||||
|
||||
require github.com/go-resty/resty/v2 v2.10.0
|
||||
|
||||
require golang.org/x/net v0.17.0 // indirect
|
||||
@@ -1,44 +0,0 @@
|
||||
github.com/go-resty/resty/v2 v2.10.0 h1:Qla4W/+TMmv0fOeeRqzEpXPLfTUnR5HZ1+lGs+CkiCo=
|
||||
github.com/go-resty/resty/v2 v2.10.0/go.mod h1:iiP/OpA0CkcL3IGt1O0+/SIItFUbkkyw5BGXiVdTu+A=
|
||||
github.com/yuin/goldmark v1.4.13/go.mod h1:6yULJ656Px+3vBD8DxQVa3kxgyrAnzto9xy5taEt/CY=
|
||||
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
|
||||
golang.org/x/crypto v0.0.0-20210921155107-089bfa567519/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
|
||||
golang.org/x/crypto v0.14.0/go.mod h1:MVFd36DqK4CsrnJYDkBA3VC4m2GkXAM0PvzMCn4JQf4=
|
||||
golang.org/x/mod v0.6.0-dev.0.20220419223038-86c51ed26bb4/go.mod h1:jJ57K6gSWd91VN4djpZkiMVwK6gcyfeH4XE8wZrZaV4=
|
||||
golang.org/x/mod v0.8.0/go.mod h1:iBbtSCu2XBx23ZKBPSOrRkjjQPZFPuis4dIYUhu/chs=
|
||||
golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
|
||||
golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg=
|
||||
golang.org/x/net v0.0.0-20220722155237-a158d28d115b/go.mod h1:XRhObCWvk6IyKnWLug+ECip1KBveYUHfp+8e9klMJ9c=
|
||||
golang.org/x/net v0.6.0/go.mod h1:2Tu9+aMcznHK/AK1HMvgo6xiTLG5rD5rZLDS+rp2Bjs=
|
||||
golang.org/x/net v0.10.0/go.mod h1:0qNGK6F8kojg2nk9dLZ2mShWaEBan6FAoqfSigmmuDg=
|
||||
golang.org/x/net v0.17.0 h1:pVaXccu2ozPjCXewfr1S7xza/zcXTity9cCdXQYSjIM=
|
||||
golang.org/x/net v0.17.0/go.mod h1:NxSsAGuq816PNPmqtQdLE42eU2Fs7NoRIZrHJAlaCOE=
|
||||
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
||||
golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
||||
golang.org/x/sync v0.1.0/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
|
||||
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
|
||||
golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
|
||||
golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.0.0-20220722155257-8c9f86f7a55f/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.8.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/sys v0.13.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
|
||||
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
|
||||
golang.org/x/term v0.0.0-20210927222741-03fcf44c2211/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8=
|
||||
golang.org/x/term v0.5.0/go.mod h1:jMB1sMXY+tzblOD4FWmEbocvup2/aLOaQEp7JmGp78k=
|
||||
golang.org/x/term v0.8.0/go.mod h1:xPskH00ivmX89bAKVGSKKtLOWNx2+17Eiy94tnKShWo=
|
||||
golang.org/x/term v0.13.0/go.mod h1:LTmsnFJwVN6bCy1rVCoS+qHT1HhALEFxKncY3WNNh4U=
|
||||
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
|
||||
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
|
||||
golang.org/x/text v0.3.7/go.mod h1:u+2+/6zg+i71rQMx5EYifcz6MCKuco9NR6JIITiCfzQ=
|
||||
golang.org/x/text v0.7.0/go.mod h1:mrYo+phRRbMaCq/xk9113O4dZlRixOauAjOtrjsXDZ8=
|
||||
golang.org/x/text v0.9.0/go.mod h1:e1OnstbJyHTd6l/uOt8jFFHp6TRDWZR/bV3emEE/zU8=
|
||||
golang.org/x/text v0.13.0/go.mod h1:TvPlkZtksWOMsz7fbANvkp4WM8x/WCo/om8BMLbz+aE=
|
||||
golang.org/x/time v0.3.0 h1:rg5rLMjNzMS1RkNLzCG38eapWhnYLFYXDXj2gOlr8j4=
|
||||
golang.org/x/time v0.3.0/go.mod h1:tRJNPiyCQ0inRvYxbN9jk5I+vvW/OXSQhTDSoE431IQ=
|
||||
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
|
||||
golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
|
||||
golang.org/x/tools v0.1.12/go.mod h1:hNGJHUnrk76NpqgfD5Aqm5Crs+Hm0VOH/i9J2+nxYbc=
|
||||
golang.org/x/tools v0.6.0/go.mod h1:Xwgl3UAJ/d3gWutnCtw505GrjyAbvKui8lOU390QaIU=
|
||||
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
|
||||
BIN
backend/main.exe
BIN
backend/main.exe
Binary file not shown.
Binary file not shown.
117
backend/pkg/database/db.go
Normal file
117
backend/pkg/database/db.go
Normal file
@@ -0,0 +1,117 @@
|
||||
package database
|
||||
|
||||
import (
|
||||
"database/sql"
|
||||
"fmt"
|
||||
"os"
|
||||
"path/filepath"
|
||||
|
||||
_ "modernc.org/sqlite"
|
||||
)
|
||||
|
||||
// DB 数据库实例
|
||||
type DB struct {
|
||||
*sql.DB
|
||||
}
|
||||
|
||||
// InitDB 初始化数据库
|
||||
func InitDB(dbPath string) (*DB, error) {
|
||||
// 确保数据库目录存在
|
||||
dbDir := filepath.Dir(dbPath)
|
||||
if err := os.MkdirAll(dbDir, 0755); err != nil {
|
||||
return nil, fmt.Errorf("创建数据库目录失败: %v", err)
|
||||
}
|
||||
|
||||
// 打开数据库连接(使用modernc.org/sqlite驱动)
|
||||
db, err := sql.Open("sqlite", dbPath)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("打开数据库失败: %v", err)
|
||||
}
|
||||
|
||||
// 测试连接
|
||||
if err := db.Ping(); err != nil {
|
||||
return nil, fmt.Errorf("数据库连接测试失败: %v", err)
|
||||
}
|
||||
|
||||
// 创建表
|
||||
if err := createTables(db); err != nil {
|
||||
return nil, fmt.Errorf("创建数据表失败: %v", err)
|
||||
}
|
||||
|
||||
fmt.Println("✅ 数据库初始化成功:", dbPath)
|
||||
return &DB{db}, nil
|
||||
}
|
||||
|
||||
// createTables 创建数据表
|
||||
func createTables(db *sql.DB) error {
|
||||
// 公众号表
|
||||
officialAccountTable := `
|
||||
CREATE TABLE IF NOT EXISTS official_accounts (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
biz TEXT NOT NULL UNIQUE,
|
||||
nickname TEXT NOT NULL,
|
||||
homepage TEXT,
|
||||
description TEXT,
|
||||
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_biz ON official_accounts(biz);
|
||||
CREATE INDEX IF NOT EXISTS idx_nickname ON official_accounts(nickname);
|
||||
`
|
||||
|
||||
// 文章表
|
||||
articleTable := `
|
||||
CREATE TABLE IF NOT EXISTS articles (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
official_id INTEGER NOT NULL,
|
||||
title TEXT NOT NULL,
|
||||
author TEXT,
|
||||
link TEXT UNIQUE,
|
||||
publish_time TEXT,
|
||||
create_time TEXT,
|
||||
comment_id TEXT,
|
||||
read_num INTEGER DEFAULT 0,
|
||||
like_num INTEGER DEFAULT 0,
|
||||
share_num INTEGER DEFAULT 0,
|
||||
content_preview TEXT,
|
||||
paragraph_count INTEGER DEFAULT 0,
|
||||
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
|
||||
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP,
|
||||
FOREIGN KEY (official_id) REFERENCES official_accounts(id)
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_official_id ON articles(official_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_title ON articles(title);
|
||||
CREATE INDEX IF NOT EXISTS idx_publish_time ON articles(publish_time);
|
||||
CREATE INDEX IF NOT EXISTS idx_link ON articles(link);
|
||||
`
|
||||
|
||||
// 文章内容表
|
||||
articleContentTable := `
|
||||
CREATE TABLE IF NOT EXISTS article_contents (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
article_id INTEGER NOT NULL UNIQUE,
|
||||
html_content TEXT,
|
||||
text_content TEXT,
|
||||
paragraphs TEXT,
|
||||
images TEXT,
|
||||
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
|
||||
FOREIGN KEY (article_id) REFERENCES articles(id) ON DELETE CASCADE
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_article_id ON article_contents(article_id);
|
||||
`
|
||||
|
||||
// 执行创建表语句
|
||||
tables := []string{officialAccountTable, articleTable, articleContentTable}
|
||||
for _, table := range tables {
|
||||
if _, err := db.Exec(table); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// Close 关闭数据库连接
|
||||
func (db *DB) Close() error {
|
||||
return db.DB.Close()
|
||||
}
|
||||
76
backend/pkg/database/models.go
Normal file
76
backend/pkg/database/models.go
Normal file
@@ -0,0 +1,76 @@
|
||||
package database
|
||||
|
||||
import (
|
||||
"time"
|
||||
)
|
||||
|
||||
// OfficialAccount 公众号信息
|
||||
type OfficialAccount struct {
|
||||
ID int64 `json:"id"`
|
||||
Biz string `json:"biz"` // 公众号唯一标识
|
||||
Nickname string `json:"nickname"` // 公众号名称
|
||||
Homepage string `json:"homepage"` // 公众号主页链接
|
||||
Description string `json:"description"` // 公众号描述
|
||||
CreatedAt time.Time `json:"created_at"` // 创建时间
|
||||
UpdatedAt time.Time `json:"updated_at"` // 更新时间
|
||||
}
|
||||
|
||||
// Article 文章信息
|
||||
type Article struct {
|
||||
ID int64 `json:"id"`
|
||||
OfficialID int64 `json:"official_id"` // 关联的公众号ID
|
||||
Title string `json:"title"` // 文章标题
|
||||
Author string `json:"author"` // 作者
|
||||
Link string `json:"link"` // 文章链接
|
||||
PublishTime string `json:"publish_time"` // 发布时间
|
||||
CreateTime string `json:"create_time"` // 创建时间(抓取时间)
|
||||
CommentID string `json:"comment_id"` // 评论ID
|
||||
ReadNum int `json:"read_num"` // 阅读数
|
||||
LikeNum int `json:"like_num"` // 点赞数
|
||||
ShareNum int `json:"share_num"` // 分享数
|
||||
ContentPreview string `json:"content_preview"` // 内容预览(前200字)
|
||||
ParagraphCount int `json:"paragraph_count"` // 段落数
|
||||
CreatedAt time.Time `json:"created_at"` // 数据库创建时间
|
||||
UpdatedAt time.Time `json:"updated_at"` // 数据库更新时间
|
||||
}
|
||||
|
||||
// ArticleContent 文章详细内容
|
||||
type ArticleContent struct {
|
||||
ID int64 `json:"id"`
|
||||
ArticleID int64 `json:"article_id"` // 关联的文章ID
|
||||
HtmlContent string `json:"html_content"` // HTML原始内容
|
||||
TextContent string `json:"text_content"` // 纯文本内容
|
||||
Paragraphs string `json:"paragraphs"` // 段落内容(JSON数组)
|
||||
Images string `json:"images"` // 图片链接(JSON数组)
|
||||
CreatedAt time.Time `json:"created_at"` // 创建时间
|
||||
}
|
||||
|
||||
// ArticleListItem 文章列表项(用于API返回)
|
||||
type ArticleListItem struct {
|
||||
ID int64 `json:"id"`
|
||||
Title string `json:"title"`
|
||||
Author string `json:"author"`
|
||||
PublishTime string `json:"publish_time"`
|
||||
ReadNum int `json:"read_num"`
|
||||
LikeNum int `json:"like_num"`
|
||||
OfficialName string `json:"official_name"`
|
||||
ContentPreview string `json:"content_preview"`
|
||||
}
|
||||
|
||||
// ArticleDetail 文章详情(用于API返回)
|
||||
type ArticleDetail struct {
|
||||
Article
|
||||
OfficialName string `json:"official_name"`
|
||||
HtmlContent string `json:"html_content"`
|
||||
TextContent string `json:"text_content"`
|
||||
Paragraphs []string `json:"paragraphs"`
|
||||
Images []string `json:"images"`
|
||||
}
|
||||
|
||||
// Statistics 统计信息
|
||||
type Statistics struct {
|
||||
TotalOfficials int `json:"total_officials"` // 公众号总数
|
||||
TotalArticles int `json:"total_articles"` // 文章总数
|
||||
TotalReadNum int `json:"total_read_num"` // 总阅读数
|
||||
TotalLikeNum int `json:"total_like_num"` // 总点赞数
|
||||
}
|
||||
455
backend/pkg/database/repository.go
Normal file
455
backend/pkg/database/repository.go
Normal file
@@ -0,0 +1,455 @@
|
||||
package database
|
||||
|
||||
import (
|
||||
"database/sql"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// OfficialAccountRepository 公众号数据仓库
|
||||
type OfficialAccountRepository struct {
|
||||
db *DB
|
||||
}
|
||||
|
||||
// NewOfficialAccountRepository 创建公众号仓库
|
||||
func NewOfficialAccountRepository(db *DB) *OfficialAccountRepository {
|
||||
return &OfficialAccountRepository{db: db}
|
||||
}
|
||||
|
||||
// Create 创建公众号
|
||||
func (r *OfficialAccountRepository) Create(account *OfficialAccount) (int64, error) {
|
||||
result, err := r.db.Exec(`
|
||||
INSERT INTO official_accounts (biz, nickname, homepage, description)
|
||||
VALUES (?, ?, ?, ?)
|
||||
`, account.Biz, account.Nickname, account.Homepage, account.Description)
|
||||
|
||||
if err != nil {
|
||||
return 0, err
|
||||
}
|
||||
|
||||
return result.LastInsertId()
|
||||
}
|
||||
|
||||
// GetByBiz 根据Biz获取公众号
|
||||
func (r *OfficialAccountRepository) GetByBiz(biz string) (*OfficialAccount, error) {
|
||||
account := &OfficialAccount{}
|
||||
err := r.db.QueryRow(`
|
||||
SELECT id, biz, nickname, homepage, description, created_at, updated_at
|
||||
FROM official_accounts WHERE biz = ?
|
||||
`, biz).Scan(&account.ID, &account.Biz, &account.Nickname, &account.Homepage,
|
||||
&account.Description, &account.CreatedAt, &account.UpdatedAt)
|
||||
|
||||
if err == sql.ErrNoRows {
|
||||
return nil, nil
|
||||
}
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
return account, nil
|
||||
}
|
||||
|
||||
// GetByID 根据ID获取公众号
|
||||
func (r *OfficialAccountRepository) GetByID(id int64) (*OfficialAccount, error) {
|
||||
account := &OfficialAccount{}
|
||||
err := r.db.QueryRow(`
|
||||
SELECT id, biz, nickname, homepage, description, created_at, updated_at
|
||||
FROM official_accounts WHERE id = ?
|
||||
`, id).Scan(&account.ID, &account.Biz, &account.Nickname, &account.Homepage,
|
||||
&account.Description, &account.CreatedAt, &account.UpdatedAt)
|
||||
|
||||
if err == sql.ErrNoRows {
|
||||
return nil, nil
|
||||
}
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
return account, nil
|
||||
}
|
||||
|
||||
// List 获取所有公众号列表
|
||||
func (r *OfficialAccountRepository) List() ([]*OfficialAccount, error) {
|
||||
rows, err := r.db.Query(`
|
||||
SELECT id, biz, nickname, homepage, description, created_at, updated_at
|
||||
FROM official_accounts ORDER BY created_at DESC
|
||||
`)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
defer rows.Close()
|
||||
|
||||
var accounts []*OfficialAccount
|
||||
for rows.Next() {
|
||||
account := &OfficialAccount{}
|
||||
err := rows.Scan(&account.ID, &account.Biz, &account.Nickname, &account.Homepage,
|
||||
&account.Description, &account.CreatedAt, &account.UpdatedAt)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
accounts = append(accounts, account)
|
||||
}
|
||||
|
||||
return accounts, nil
|
||||
}
|
||||
|
||||
// Update 更新公众号信息
|
||||
func (r *OfficialAccountRepository) Update(account *OfficialAccount) error {
|
||||
_, err := r.db.Exec(`
|
||||
UPDATE official_accounts
|
||||
SET nickname = ?, homepage = ?, description = ?, updated_at = CURRENT_TIMESTAMP
|
||||
WHERE id = ?
|
||||
`, account.Nickname, account.Homepage, account.Description, account.ID)
|
||||
|
||||
return err
|
||||
}
|
||||
|
||||
// ArticleRepository 文章数据仓库
|
||||
type ArticleRepository struct {
|
||||
db *DB
|
||||
}
|
||||
|
||||
// NewArticleRepository 创建文章仓库
|
||||
func NewArticleRepository(db *DB) *ArticleRepository {
|
||||
return &ArticleRepository{db: db}
|
||||
}
|
||||
|
||||
// Create 创建文章
|
||||
func (r *ArticleRepository) Create(article *Article) (int64, error) {
|
||||
result, err := r.db.Exec(`
|
||||
INSERT INTO articles (
|
||||
official_id, title, author, link, publish_time, create_time,
|
||||
comment_id, read_num, like_num, share_num, content_preview, paragraph_count
|
||||
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
`, article.OfficialID, article.Title, article.Author, article.Link,
|
||||
article.PublishTime, article.CreateTime, article.CommentID,
|
||||
article.ReadNum, article.LikeNum, article.ShareNum,
|
||||
article.ContentPreview, article.ParagraphCount)
|
||||
|
||||
if err != nil {
|
||||
return 0, err
|
||||
}
|
||||
|
||||
return result.LastInsertId()
|
||||
}
|
||||
|
||||
// GetByID 根据ID获取文章
|
||||
func (r *ArticleRepository) GetByID(id int64) (*Article, error) {
|
||||
article := &Article{}
|
||||
err := r.db.QueryRow(`
|
||||
SELECT id, official_id, title, author, link, publish_time, create_time,
|
||||
comment_id, read_num, like_num, share_num, content_preview,
|
||||
paragraph_count, created_at, updated_at
|
||||
FROM articles WHERE id = ?
|
||||
`, id).Scan(&article.ID, &article.OfficialID, &article.Title, &article.Author,
|
||||
&article.Link, &article.PublishTime, &article.CreateTime, &article.CommentID,
|
||||
&article.ReadNum, &article.LikeNum, &article.ShareNum, &article.ContentPreview,
|
||||
&article.ParagraphCount, &article.CreatedAt, &article.UpdatedAt)
|
||||
|
||||
if err == sql.ErrNoRows {
|
||||
return nil, nil
|
||||
}
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
return article, nil
|
||||
}
|
||||
|
||||
// GetByLink 根据链接获取文章
|
||||
func (r *ArticleRepository) GetByLink(link string) (*Article, error) {
|
||||
article := &Article{}
|
||||
err := r.db.QueryRow(`
|
||||
SELECT id, official_id, title, author, link, publish_time, create_time,
|
||||
comment_id, read_num, like_num, share_num, content_preview,
|
||||
paragraph_count, created_at, updated_at
|
||||
FROM articles WHERE link = ?
|
||||
`, link).Scan(&article.ID, &article.OfficialID, &article.Title, &article.Author,
|
||||
&article.Link, &article.PublishTime, &article.CreateTime, &article.CommentID,
|
||||
&article.ReadNum, &article.LikeNum, &article.ShareNum, &article.ContentPreview,
|
||||
&article.ParagraphCount, &article.CreatedAt, &article.UpdatedAt)
|
||||
|
||||
if err == sql.ErrNoRows {
|
||||
return nil, nil
|
||||
}
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
return article, nil
|
||||
}
|
||||
|
||||
// List 获取文章列表(分页)
|
||||
func (r *ArticleRepository) List(officialID int64, page, pageSize int) ([]*ArticleListItem, int, error) {
|
||||
// 构建查询条件
|
||||
whereClause := ""
|
||||
args := []interface{}{}
|
||||
|
||||
if officialID > 0 {
|
||||
whereClause = "WHERE a.official_id = ?"
|
||||
args = append(args, officialID)
|
||||
}
|
||||
|
||||
// 获取总数
|
||||
countQuery := fmt.Sprintf("SELECT COUNT(*) FROM articles a %s", whereClause)
|
||||
var total int
|
||||
err := r.db.QueryRow(countQuery, args...).Scan(&total)
|
||||
if err != nil {
|
||||
return nil, 0, err
|
||||
}
|
||||
|
||||
// 获取列表
|
||||
offset := (page - 1) * pageSize
|
||||
listQuery := fmt.Sprintf(`
|
||||
SELECT a.id, a.title, a.author, a.publish_time, a.read_num, a.like_num,
|
||||
a.content_preview, o.nickname
|
||||
FROM articles a
|
||||
LEFT JOIN official_accounts o ON a.official_id = o.id
|
||||
%s
|
||||
ORDER BY a.publish_time DESC
|
||||
LIMIT ? OFFSET ?
|
||||
`, whereClause)
|
||||
|
||||
args = append(args, pageSize, offset)
|
||||
rows, err := r.db.Query(listQuery, args...)
|
||||
if err != nil {
|
||||
return nil, 0, err
|
||||
}
|
||||
defer rows.Close()
|
||||
|
||||
var items []*ArticleListItem
|
||||
for rows.Next() {
|
||||
item := &ArticleListItem{}
|
||||
err := rows.Scan(&item.ID, &item.Title, &item.Author, &item.PublishTime,
|
||||
&item.ReadNum, &item.LikeNum, &item.ContentPreview, &item.OfficialName)
|
||||
if err != nil {
|
||||
return nil, 0, err
|
||||
}
|
||||
items = append(items, item)
|
||||
}
|
||||
|
||||
return items, total, nil
|
||||
}
|
||||
|
||||
// Search 搜索文章
|
||||
func (r *ArticleRepository) Search(keyword string, page, pageSize int) ([]*ArticleListItem, int, error) {
|
||||
keyword = "%" + keyword + "%"
|
||||
|
||||
// 获取总数
|
||||
var total int
|
||||
err := r.db.QueryRow(`
|
||||
SELECT COUNT(*) FROM articles WHERE title LIKE ? OR author LIKE ?
|
||||
`, keyword, keyword).Scan(&total)
|
||||
if err != nil {
|
||||
return nil, 0, err
|
||||
}
|
||||
|
||||
// 获取列表
|
||||
offset := (page - 1) * pageSize
|
||||
rows, err := r.db.Query(`
|
||||
SELECT a.id, a.title, a.author, a.publish_time, a.read_num, a.like_num,
|
||||
a.content_preview, o.nickname
|
||||
FROM articles a
|
||||
LEFT JOIN official_accounts o ON a.official_id = o.id
|
||||
WHERE a.title LIKE ? OR a.author LIKE ?
|
||||
ORDER BY a.publish_time DESC
|
||||
LIMIT ? OFFSET ?
|
||||
`, keyword, keyword, pageSize, offset)
|
||||
if err != nil {
|
||||
return nil, 0, err
|
||||
}
|
||||
defer rows.Close()
|
||||
|
||||
var items []*ArticleListItem
|
||||
for rows.Next() {
|
||||
item := &ArticleListItem{}
|
||||
err := rows.Scan(&item.ID, &item.Title, &item.Author, &item.PublishTime,
|
||||
&item.ReadNum, &item.LikeNum, &item.ContentPreview, &item.OfficialName)
|
||||
if err != nil {
|
||||
return nil, 0, err
|
||||
}
|
||||
items = append(items, item)
|
||||
}
|
||||
|
||||
return items, total, nil
|
||||
}
|
||||
|
||||
// Update 更新文章信息
|
||||
func (r *ArticleRepository) Update(article *Article) error {
|
||||
_, err := r.db.Exec(`
|
||||
UPDATE articles
|
||||
SET read_num = ?, like_num = ?, share_num = ?, updated_at = CURRENT_TIMESTAMP
|
||||
WHERE id = ?
|
||||
`, article.ReadNum, article.LikeNum, article.ShareNum, article.ID)
|
||||
|
||||
return err
|
||||
}
|
||||
|
||||
// ArticleContentRepository 文章内容数据仓库
|
||||
type ArticleContentRepository struct {
|
||||
db *DB
|
||||
}
|
||||
|
||||
// NewArticleContentRepository 创建文章内容仓库
|
||||
func NewArticleContentRepository(db *DB) *ArticleContentRepository {
|
||||
return &ArticleContentRepository{db: db}
|
||||
}
|
||||
|
||||
// Create 创建文章内容
|
||||
func (r *ArticleContentRepository) Create(content *ArticleContent) (int64, error) {
|
||||
result, err := r.db.Exec(`
|
||||
INSERT INTO article_contents (article_id, html_content, text_content, paragraphs, images)
|
||||
VALUES (?, ?, ?, ?, ?)
|
||||
`, content.ArticleID, content.HtmlContent, content.TextContent,
|
||||
content.Paragraphs, content.Images)
|
||||
|
||||
if err != nil {
|
||||
return 0, err
|
||||
}
|
||||
|
||||
return result.LastInsertId()
|
||||
}
|
||||
|
||||
// GetByArticleID 根据文章ID获取内容
|
||||
func (r *ArticleContentRepository) GetByArticleID(articleID int64) (*ArticleContent, error) {
|
||||
content := &ArticleContent{}
|
||||
err := r.db.QueryRow(`
|
||||
SELECT id, article_id, html_content, text_content, paragraphs, images, created_at
|
||||
FROM article_contents WHERE article_id = ?
|
||||
`, articleID).Scan(&content.ID, &content.ArticleID, &content.HtmlContent,
|
||||
&content.TextContent, &content.Paragraphs, &content.Images, &content.CreatedAt)
|
||||
|
||||
if err == sql.ErrNoRows {
|
||||
return nil, nil
|
||||
}
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
return content, nil
|
||||
}
|
||||
|
||||
// GetArticleDetail 获取文章详情(包含内容)
|
||||
func (r *ArticleContentRepository) GetArticleDetail(articleID int64) (*ArticleDetail, error) {
|
||||
detail := &ArticleDetail{}
|
||||
var paragraphsJSON, imagesJSON string
|
||||
|
||||
err := r.db.QueryRow(`
|
||||
SELECT a.id, a.official_id, a.title, a.author, a.link, a.publish_time,
|
||||
a.create_time, a.comment_id, a.read_num, a.like_num, a.share_num,
|
||||
a.content_preview, a.paragraph_count, a.created_at, a.updated_at,
|
||||
o.nickname, c.html_content, c.text_content, c.paragraphs, c.images
|
||||
FROM articles a
|
||||
LEFT JOIN official_accounts o ON a.official_id = o.id
|
||||
LEFT JOIN article_contents c ON a.id = c.article_id
|
||||
WHERE a.id = ?
|
||||
`, articleID).Scan(
|
||||
&detail.ID, &detail.OfficialID, &detail.Title, &detail.Author,
|
||||
&detail.Link, &detail.PublishTime, &detail.CreateTime, &detail.CommentID,
|
||||
&detail.ReadNum, &detail.LikeNum, &detail.ShareNum, &detail.ContentPreview,
|
||||
&detail.ParagraphCount, &detail.CreatedAt, &detail.UpdatedAt,
|
||||
&detail.OfficialName, &detail.HtmlContent, &detail.TextContent,
|
||||
¶graphsJSON, &imagesJSON,
|
||||
)
|
||||
|
||||
if err == sql.ErrNoRows {
|
||||
return nil, nil
|
||||
}
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
// 解析JSON数组
|
||||
if paragraphsJSON != "" {
|
||||
json.Unmarshal([]byte(paragraphsJSON), &detail.Paragraphs)
|
||||
}
|
||||
if imagesJSON != "" {
|
||||
json.Unmarshal([]byte(imagesJSON), &detail.Images)
|
||||
}
|
||||
|
||||
return detail, nil
|
||||
}
|
||||
|
||||
// GetStatistics 获取统计信息
|
||||
func (db *DB) GetStatistics() (*Statistics, error) {
|
||||
stats := &Statistics{}
|
||||
|
||||
err := db.QueryRow(`
|
||||
SELECT
|
||||
(SELECT COUNT(*) FROM official_accounts) as total_officials,
|
||||
(SELECT COUNT(*) FROM articles) as total_articles,
|
||||
(SELECT COALESCE(SUM(read_num), 0) FROM articles) as total_read_num,
|
||||
(SELECT COALESCE(SUM(like_num), 0) FROM articles) as total_like_num
|
||||
`).Scan(&stats.TotalOfficials, &stats.TotalArticles, &stats.TotalReadNum, &stats.TotalLikeNum)
|
||||
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
return stats, nil
|
||||
}
|
||||
|
||||
// BatchInsertArticles 批量插入文章
|
||||
func (r *ArticleRepository) BatchInsertArticles(articles []*Article) error {
|
||||
if len(articles) == 0 {
|
||||
return nil
|
||||
}
|
||||
|
||||
// 开始事务
|
||||
tx, err := r.db.Begin()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
defer tx.Rollback()
|
||||
|
||||
stmt, err := tx.Prepare(`
|
||||
INSERT OR IGNORE INTO articles (
|
||||
official_id, title, author, link, publish_time, create_time,
|
||||
comment_id, read_num, like_num, share_num, content_preview, paragraph_count
|
||||
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
`)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
defer stmt.Close()
|
||||
|
||||
for _, article := range articles {
|
||||
_, err = stmt.Exec(
|
||||
article.OfficialID, article.Title, article.Author, article.Link,
|
||||
article.PublishTime, article.CreateTime, article.CommentID,
|
||||
article.ReadNum, article.LikeNum, article.ShareNum,
|
||||
article.ContentPreview, article.ParagraphCount,
|
||||
)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
|
||||
return tx.Commit()
|
||||
}
|
||||
|
||||
// Helper function: 将字符串数组转换为JSON字符串
|
||||
func StringsToJSON(strs []string) string {
|
||||
if len(strs) == 0 {
|
||||
return "[]"
|
||||
}
|
||||
data, _ := json.Marshal(strs)
|
||||
return string(data)
|
||||
}
|
||||
|
||||
// Helper function: 生成内容预览
|
||||
func GeneratePreview(content string, maxLen int) string {
|
||||
if len(content) <= maxLen {
|
||||
return content
|
||||
}
|
||||
// 移除换行符和多余空格
|
||||
content = strings.ReplaceAll(content, "\n", " ")
|
||||
content = strings.ReplaceAll(content, "\r", "")
|
||||
content = strings.Join(strings.Fields(content), " ")
|
||||
|
||||
if len(content) <= maxLen {
|
||||
return content
|
||||
}
|
||||
return content[:maxLen] + "..."
|
||||
}
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,48 +0,0 @@
|
||||
@echo off
|
||||
|
||||
echo WeChat Public Article Crawler Startup Script
|
||||
echo =================================
|
||||
|
||||
REM Check if cookie.txt file exists
|
||||
if not exist "cookie.txt" (
|
||||
echo Error: cookie.txt file not found!
|
||||
echo Please create cookie.txt file in backend directory and add WeChat public platform cookie information.
|
||||
echo.
|
||||
echo cookie.txt format example:
|
||||
echo __biz=xxx; uin=xxx; key=xxx; pass_ticket=xxx;
|
||||
echo.
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
REM Set Go environment variables (if needed)
|
||||
REM set GOPATH=%USERPROFILE%\go
|
||||
REM set GOROOT=C:\Go
|
||||
REM set PATH=%PATH%;%GOROOT%\bin;%GOPATH%\bin
|
||||
|
||||
echo Downloading dependencies...
|
||||
go mod tidy
|
||||
if %errorlevel% neq 0 (
|
||||
echo Failed to download dependencies!
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
echo Compiling program...
|
||||
go build -o output\wechat-crawler.exe cmd\main.go
|
||||
if %errorlevel% neq 0 (
|
||||
echo Compilation failed!
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
echo Compilation successful! Starting program...
|
||||
echo.
|
||||
|
||||
REM Ensure data directory exists
|
||||
if not exist "data" mkdir data
|
||||
|
||||
REM Run the program
|
||||
output\wechat-crawler.exe
|
||||
|
||||
pause
|
||||
@@ -1,57 +0,0 @@
|
||||
@echo off
|
||||
|
||||
rem WeChat Official Account Article Crawler - Script for crawling via article link
|
||||
setlocal enabledelayedexpansion
|
||||
|
||||
REM 检查是否有命令行参数传入
|
||||
if "%1" neq "" (
|
||||
REM 如果有参数,直接将其作为文章链接传入程序
|
||||
echo.
|
||||
echo Compiling and running...
|
||||
go run "cmd/main.go" "%1"
|
||||
|
||||
if errorlevel 1 (
|
||||
echo.
|
||||
echo Failed to run, please check error messages above
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
echo.
|
||||
echo Crawling completed successfully!
|
||||
pause
|
||||
exit /b 0
|
||||
) else (
|
||||
REM 如果没有参数,运行交互式模式
|
||||
:input_loop
|
||||
cls
|
||||
echo ========================================
|
||||
echo WeChat Official Account Article Crawler
|
||||
echo ========================================
|
||||
echo.
|
||||
echo Please enter WeChat article link:
|
||||
echo Example: https://mp.weixin.qq.com/s/4r_LKJu0mOeUc70ZZXK9LA
|
||||
set /p ARTICLE_LINK=
|
||||
|
||||
if "%ARTICLE_LINK%"=="" (
|
||||
echo.
|
||||
echo Error: Article link cannot be empty!
|
||||
pause
|
||||
goto input_loop
|
||||
)
|
||||
|
||||
echo.
|
||||
echo Compiling and running...
|
||||
go run "cmd/main.go" "%ARTICLE_LINK%"
|
||||
|
||||
if errorlevel 1 (
|
||||
echo.
|
||||
echo Failed to run, please check error messages above
|
||||
pause
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
echo.
|
||||
echo Crawling completed successfully!
|
||||
pause
|
||||
)
|
||||
21
backend/tools/view_db.bat
Normal file
21
backend/tools/view_db.bat
Normal file
@@ -0,0 +1,21 @@
|
||||
@echo off
|
||||
chcp 65001 >nul
|
||||
cls
|
||||
|
||||
echo ===============================================
|
||||
echo 📊 数据库内容查看工具
|
||||
echo ===============================================
|
||||
echo.
|
||||
|
||||
cd /d "%~dp0"
|
||||
|
||||
echo 正在查询数据库...
|
||||
echo.
|
||||
|
||||
go run view_db.go
|
||||
|
||||
echo.
|
||||
echo ===============================================
|
||||
echo 查询完成!
|
||||
echo ===============================================
|
||||
pause
|
||||
231
backend/tools/view_db.go
Normal file
231
backend/tools/view_db.go
Normal file
@@ -0,0 +1,231 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"database/sql"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"log"
|
||||
|
||||
_ "modernc.org/sqlite"
|
||||
)
|
||||
|
||||
func main() {
|
||||
// 打开数据库
|
||||
db, err := sql.Open("sqlite", "../../data/wechat_articles.db")
|
||||
if err != nil {
|
||||
log.Fatal("打开数据库失败:", err)
|
||||
}
|
||||
defer db.Close()
|
||||
|
||||
fmt.Println("=" + repeatStr("=", 80))
|
||||
fmt.Println("📊 微信公众号文章数据库内容查看")
|
||||
fmt.Println("=" + repeatStr("=", 80))
|
||||
|
||||
// 查询公众号
|
||||
fmt.Println("\n📢 【公众号列表】")
|
||||
fmt.Println(repeatStr("-", 80))
|
||||
queryOfficialAccounts(db)
|
||||
|
||||
// 查询文章
|
||||
fmt.Println("\n📝 【文章列表】")
|
||||
fmt.Println(repeatStr("-", 80))
|
||||
queryArticles(db)
|
||||
|
||||
// 查询文章内容
|
||||
fmt.Println("\n📄 【文章详细内容】")
|
||||
fmt.Println(repeatStr("-", 80))
|
||||
queryArticleContents(db)
|
||||
|
||||
fmt.Println("\n" + repeatStr("=", 80))
|
||||
}
|
||||
|
||||
func queryOfficialAccounts(db *sql.DB) {
|
||||
rows, err := db.Query(`
|
||||
SELECT id, biz, nickname, homepage, description, created_at, updated_at
|
||||
FROM official_accounts
|
||||
ORDER BY id
|
||||
`)
|
||||
if err != nil {
|
||||
log.Printf("查询公众号失败: %v\n", err)
|
||||
return
|
||||
}
|
||||
defer rows.Close()
|
||||
|
||||
count := 0
|
||||
for rows.Next() {
|
||||
var id int
|
||||
var biz, nickname, homepage, description, createdAt, updatedAt string
|
||||
err := rows.Scan(&id, &biz, &nickname, &homepage, &description, &createdAt, &updatedAt)
|
||||
if err != nil {
|
||||
log.Printf("读取数据失败: %v\n", err)
|
||||
continue
|
||||
}
|
||||
count++
|
||||
|
||||
fmt.Printf("\n🔹 公众号 #%d\n", id)
|
||||
fmt.Printf(" 名称: %s\n", nickname)
|
||||
fmt.Printf(" BIZ: %s\n", biz)
|
||||
fmt.Printf(" 主页: %s\n", homepage)
|
||||
fmt.Printf(" 简介: %s\n", description)
|
||||
fmt.Printf(" 创建时间: %s\n", createdAt)
|
||||
fmt.Printf(" 更新时间: %s\n", updatedAt)
|
||||
}
|
||||
|
||||
if count == 0 {
|
||||
fmt.Println(" 暂无数据")
|
||||
} else {
|
||||
fmt.Printf("\n总计: %d 个公众号\n", count)
|
||||
}
|
||||
}
|
||||
|
||||
func queryArticles(db *sql.DB) {
|
||||
rows, err := db.Query(`
|
||||
SELECT a.id, a.official_id, a.title, a.author, a.link, a.publish_time,
|
||||
a.read_num, a.like_num, a.share_num, a.paragraph_count,
|
||||
a.content_preview, a.created_at, oa.nickname
|
||||
FROM articles a
|
||||
LEFT JOIN official_accounts oa ON a.official_id = oa.id
|
||||
ORDER BY a.id
|
||||
`)
|
||||
if err != nil {
|
||||
log.Printf("查询文章失败: %v\n", err)
|
||||
return
|
||||
}
|
||||
defer rows.Close()
|
||||
|
||||
count := 0
|
||||
for rows.Next() {
|
||||
var id, officialID, readNum, likeNum, shareNum, paragraphCount int
|
||||
var title, author, link, publishTime, contentPreview, createdAt, officialName sql.NullString
|
||||
err := rows.Scan(&id, &officialID, &title, &author, &link, &publishTime,
|
||||
&readNum, &likeNum, &shareNum, ¶graphCount, &contentPreview, &createdAt, &officialName)
|
||||
if err != nil {
|
||||
log.Printf("读取数据失败: %v\n", err)
|
||||
continue
|
||||
}
|
||||
count++
|
||||
|
||||
fmt.Printf("\n🔹 文章 #%d\n", id)
|
||||
fmt.Printf(" 标题: %s\n", getStringValue(title))
|
||||
if officialName.Valid {
|
||||
fmt.Printf(" 公众号: %s\n", officialName.String)
|
||||
}
|
||||
fmt.Printf(" 作者: %s\n", getStringValue(author))
|
||||
fmt.Printf(" 链接: %s\n", getStringValue(link))
|
||||
fmt.Printf(" 发布时间: %s\n", getStringValue(publishTime))
|
||||
fmt.Printf(" 阅读数: %d | 点赞数: %d | 分享数: %d\n", readNum, likeNum, shareNum)
|
||||
fmt.Printf(" 段落数: %d\n", paragraphCount)
|
||||
if contentPreview.Valid && contentPreview.String != "" {
|
||||
preview := contentPreview.String
|
||||
if len(preview) > 100 {
|
||||
preview = preview[:100] + "..."
|
||||
}
|
||||
fmt.Printf(" 内容预览: %s\n", preview)
|
||||
}
|
||||
fmt.Printf(" 抓取时间: %s\n", getStringValue(createdAt))
|
||||
}
|
||||
|
||||
if count == 0 {
|
||||
fmt.Println(" 暂无数据")
|
||||
} else {
|
||||
fmt.Printf("\n总计: %d 篇文章\n", count)
|
||||
}
|
||||
}
|
||||
|
||||
func queryArticleContents(db *sql.DB) {
|
||||
rows, err := db.Query(`
|
||||
SELECT ac.id, ac.article_id, ac.html_content, ac.text_content,
|
||||
ac.paragraphs, ac.images, ac.created_at, a.title
|
||||
FROM article_contents ac
|
||||
LEFT JOIN articles a ON ac.article_id = a.id
|
||||
ORDER BY ac.id
|
||||
`)
|
||||
if err != nil {
|
||||
log.Printf("查询文章内容失败: %v\n", err)
|
||||
return
|
||||
}
|
||||
defer rows.Close()
|
||||
|
||||
count := 0
|
||||
for rows.Next() {
|
||||
var id, articleID int
|
||||
var htmlContent, textContent, paragraphs, images, createdAt, title sql.NullString
|
||||
err := rows.Scan(&id, &articleID, &htmlContent, &textContent,
|
||||
¶graphs, &images, &createdAt, &title)
|
||||
if err != nil {
|
||||
log.Printf("读取数据失败: %v\n", err)
|
||||
continue
|
||||
}
|
||||
count++
|
||||
|
||||
fmt.Printf("\n🔹 内容 #%d (文章ID: %d)\n", id, articleID)
|
||||
if title.Valid {
|
||||
fmt.Printf(" 文章标题: %s\n", title.String)
|
||||
}
|
||||
|
||||
// HTML内容长度
|
||||
htmlLen := 0
|
||||
if htmlContent.Valid {
|
||||
htmlLen = len(htmlContent.String)
|
||||
}
|
||||
fmt.Printf(" HTML内容长度: %d 字符\n", htmlLen)
|
||||
|
||||
// 文本内容
|
||||
if textContent.Valid && textContent.String != "" {
|
||||
text := textContent.String
|
||||
if len(text) > 200 {
|
||||
text = text[:200] + "..."
|
||||
}
|
||||
fmt.Printf(" 文本内容: %s\n", text)
|
||||
}
|
||||
|
||||
// 段落信息
|
||||
if paragraphs.Valid && paragraphs.String != "" {
|
||||
var paragraphList []interface{}
|
||||
if err := json.Unmarshal([]byte(paragraphs.String), ¶graphList); err == nil {
|
||||
fmt.Printf(" 段落数量: %d\n", len(paragraphList))
|
||||
}
|
||||
}
|
||||
|
||||
// 图片信息
|
||||
if images.Valid && images.String != "" {
|
||||
var imageList []interface{}
|
||||
if err := json.Unmarshal([]byte(images.String), &imageList); err == nil {
|
||||
fmt.Printf(" 图片数量: %d\n", len(imageList))
|
||||
if len(imageList) > 0 {
|
||||
fmt.Printf(" 图片URL:\n")
|
||||
for i, img := range imageList {
|
||||
if i >= 3 {
|
||||
fmt.Printf(" ... 还有 %d 张图片\n", len(imageList)-3)
|
||||
break
|
||||
}
|
||||
fmt.Printf(" %d. %v\n", i+1, img)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fmt.Printf(" 存储时间: %s\n", getStringValue(createdAt))
|
||||
}
|
||||
|
||||
if count == 0 {
|
||||
fmt.Println(" 暂无数据")
|
||||
} else {
|
||||
fmt.Printf("\n总计: %d 条详细内容\n", count)
|
||||
}
|
||||
}
|
||||
|
||||
func getStringValue(s sql.NullString) string {
|
||||
if s.Valid {
|
||||
return s.String
|
||||
}
|
||||
return ""
|
||||
}
|
||||
|
||||
func repeatStr(s string, n int) string {
|
||||
result := ""
|
||||
for i := 0; i < n; i++ {
|
||||
result += s
|
||||
}
|
||||
return result
|
||||
}
|
||||
222
frontend/README.md
Normal file
222
frontend/README.md
Normal file
@@ -0,0 +1,222 @@
|
||||
# 🚀 微信公众号文章爬虫系统 - Web界面
|
||||
|
||||
一个现代化的Web界面,用于管理微信公众号文章爬虫功能。
|
||||
|
||||
## 📋 功能特性
|
||||
|
||||
### 🍪 Cookie 配置
|
||||
- 便捷的Cookie输入和保存
|
||||
- Cookie示例和验证
|
||||
- 实时状态反馈
|
||||
|
||||
### 📄 下载单篇文章
|
||||
- 支持微信文章链接输入
|
||||
- 可选择保存图片和内容
|
||||
- 实时下载进度显示
|
||||
|
||||
### 📋 获取文章列表
|
||||
- Access Token URL输入
|
||||
- 自定义获取页数
|
||||
- 批量文章信息获取
|
||||
|
||||
### 📦 批量下载文章
|
||||
- 公众号名称或链接输入
|
||||
- 批量下载文章详情
|
||||
- 智能进度跟踪
|
||||
|
||||
### 📊 数据管理
|
||||
- 已下载数据概览
|
||||
- 文章统计信息
|
||||
- 快速文件夹访问
|
||||
|
||||
## 🛠 使用方法
|
||||
|
||||
### 方法1:快速启动(推荐)
|
||||
|
||||
1. **双击启动脚本**
|
||||
```
|
||||
start_web.bat
|
||||
```
|
||||
|
||||
2. **自动打开浏览器**
|
||||
- 系统会自动检测Python或使用PowerShell
|
||||
- 默认地址:`http://localhost:8000` 或 `http://localhost:8080`
|
||||
|
||||
### 方法2:手动启动
|
||||
|
||||
#### 使用Python(推荐)
|
||||
```bash
|
||||
cd frontend
|
||||
python -m http.server 8000
|
||||
```
|
||||
|
||||
#### 使用Node.js
|
||||
```bash
|
||||
cd frontend
|
||||
npx http-server -p 8000
|
||||
```
|
||||
|
||||
#### 使用其他Web服务器
|
||||
- 将frontend文件夹作为Web根目录即可
|
||||
|
||||
## 🎮 界面使用说明
|
||||
|
||||
### 主界面
|
||||
- **功能卡片**:点击不同卡片进入对应功能
|
||||
- **现代UI**:响应式设计,支持桌面和移动端
|
||||
- **状态指示**:实时显示操作状态和进度
|
||||
|
||||
### Cookie配置页面
|
||||
1. 点击"Cookie 配置"卡片
|
||||
2. 粘贴从Fiddler获取的Cookie内容
|
||||
3. 点击"保存Cookie"按钮
|
||||
4. 等待保存成功提示
|
||||
|
||||
### 下载单篇文章
|
||||
1. 进入"下载单篇文章"功能
|
||||
2. 输入微信文章完整链接
|
||||
3. 选择是否保存图片和内容
|
||||
4. 点击"开始下载"查看进度
|
||||
|
||||
### 获取文章列表
|
||||
1. 进入"获取文章列表"功能
|
||||
2. 粘贴包含认证参数的完整URL
|
||||
3. 设置获取页数(可选)
|
||||
4. 点击"开始获取"执行任务
|
||||
|
||||
### 批量下载
|
||||
1. 进入"批量下载文章"功能
|
||||
2. 输入公众号名称或任意文章链接
|
||||
3. 选择保存选项
|
||||
4. 点击"开始批量下载"
|
||||
|
||||
### 数据管理
|
||||
1. 进入"数据管理"功能
|
||||
2. 点击"刷新列表"查看已下载数据
|
||||
3. 可以查看文章详情或打开文件夹
|
||||
|
||||
## 🎨 界面特性
|
||||
|
||||
### 响应式设计
|
||||
- ✅ 桌面端优化体验
|
||||
- ✅ 平板和手机端兼容
|
||||
- ✅ 自适应布局
|
||||
|
||||
### 现代化UI
|
||||
- 🎨 渐变色彩搭配
|
||||
- 💫 平滑动画效果
|
||||
- 📱 卡片式设计语言
|
||||
- 🌟 悬停交互反馈
|
||||
|
||||
### 交互体验
|
||||
- ⌨️ 快捷键支持(ESC返回,Ctrl+Enter执行)
|
||||
- 🔄 实时进度条
|
||||
- 📊 状态指示器
|
||||
- 🔔 操作反馈提示
|
||||
|
||||
## 🔧 技术架构
|
||||
|
||||
### 前端技术栈
|
||||
- **HTML5**: 现代语义化标记
|
||||
- **CSS3**: Flexbox/Grid + 动画
|
||||
- **JavaScript**: ES6+ + jQuery
|
||||
- **响应式**: Mobile-First设计
|
||||
|
||||
### 文件结构
|
||||
```
|
||||
frontend/
|
||||
├── index.html # 主页面
|
||||
├── css/
|
||||
│ └── style.css # 样式文件
|
||||
├── js/
|
||||
│ └── app.js # 应用逻辑
|
||||
├── start_web.bat # 启动脚本
|
||||
└── README.md # 说明文档
|
||||
```
|
||||
|
||||
### 与后端交互
|
||||
- 目前为演示版本,使用前端模拟
|
||||
- 预留了完整的API接口结构
|
||||
- 支持与命令行程序集成
|
||||
|
||||
## 🚀 部署选项
|
||||
|
||||
### 本地开发
|
||||
```bash
|
||||
# 克隆项目
|
||||
cd frontend
|
||||
|
||||
# 启动开发服务器
|
||||
python -m http.server 8000
|
||||
# 或
|
||||
npx http-server -p 8000
|
||||
```
|
||||
|
||||
### 生产环境
|
||||
- **Nginx**: 部署静态文件
|
||||
- **Apache**: 配置虚拟主机
|
||||
- **IIS**: Windows服务器部署
|
||||
- **Docker**: 容器化部署
|
||||
|
||||
## 📊 浏览器兼容性
|
||||
|
||||
| 浏览器 | 版本 | 支持状态 |
|
||||
|--------|------|---------|
|
||||
| Chrome | 60+ | ✅ 完全支持 |
|
||||
| Firefox | 55+ | ✅ 完全支持 |
|
||||
| Safari | 12+ | ✅ 完全支持 |
|
||||
| Edge | 79+ | ✅ 完全支持 |
|
||||
| IE | 11 | ⚠️ 基础支持 |
|
||||
|
||||
## 🐛 常见问题
|
||||
|
||||
### Q: 页面打不开或样式异常?
|
||||
A: 确保所有文件在同一目录下,使用HTTP服务器访问(不是file://协议)
|
||||
|
||||
### Q: 功能按钮点击无反应?
|
||||
A: 检查浏览器控制台是否有JavaScript错误,确保jQuery正常加载
|
||||
|
||||
### Q: 进度条不显示?
|
||||
A: 当前为演示版本,进度为模拟效果。实际部署需要连接后端API
|
||||
|
||||
### Q: 如何连接实际的后端?
|
||||
A: 修改`js/app.js`中的API调用部分,替换模拟逻辑为实际HTTP请求
|
||||
|
||||
## 🔮 后续计划
|
||||
|
||||
### v1.1 计划功能
|
||||
- [ ] 真实后端API集成
|
||||
- [ ] WebSocket实时通信
|
||||
- [ ] 文件上传拖拽功能
|
||||
- [ ] 任务队列管理
|
||||
|
||||
### v1.2 计划功能
|
||||
- [ ] 用户认证系统
|
||||
- [ ] 多公众号管理
|
||||
- [ ] 数据可视化图表
|
||||
- [ ] 导出功能增强
|
||||
|
||||
## 📄 开源协议
|
||||
|
||||
本项目仅供学习和研究使用,请遵守相关法律法规和服务条款。
|
||||
|
||||
## 🤝 贡献指南
|
||||
|
||||
1. Fork 本项目
|
||||
2. 创建功能分支
|
||||
3. 提交更改
|
||||
4. 发起 Pull Request
|
||||
|
||||
## 📞 支持联系
|
||||
|
||||
如有问题或建议,请通过以下方式联系:
|
||||
|
||||
- 📧 邮箱: your-email@example.com
|
||||
- 💬 Issues: 在GitHub提交Issue
|
||||
- 📱 QQ群: 123456789
|
||||
|
||||
---
|
||||
|
||||
**⚠️ 免责声明**: 本工具仅供学习交流使用,请遵守相关法律法规和平台服务条款。使用者需自行承担使用风险。
|
||||
|
||||
**🌟 如果这个项目对您有帮助,请给个Star支持一下!**
|
||||
482
frontend/css/style.css
Normal file
482
frontend/css/style.css
Normal file
@@ -0,0 +1,482 @@
|
||||
/* 全局样式 */
|
||||
* {
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
box-sizing: border-box;
|
||||
}
|
||||
|
||||
body {
|
||||
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
|
||||
line-height: 1.6;
|
||||
color: #333;
|
||||
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
||||
min-height: 100vh;
|
||||
}
|
||||
|
||||
.container {
|
||||
max-width: 1200px;
|
||||
margin: 0 auto;
|
||||
padding: 20px;
|
||||
}
|
||||
|
||||
/* 头部样式 */
|
||||
.header {
|
||||
text-align: center;
|
||||
color: white;
|
||||
margin-bottom: 30px;
|
||||
}
|
||||
|
||||
.header h1 {
|
||||
font-size: 2.5em;
|
||||
margin-bottom: 10px;
|
||||
text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.3);
|
||||
}
|
||||
|
||||
.subtitle {
|
||||
font-size: 1.1em;
|
||||
opacity: 0.9;
|
||||
}
|
||||
|
||||
/* 主内容区域 */
|
||||
.main-content {
|
||||
background: white;
|
||||
border-radius: 15px;
|
||||
padding: 30px;
|
||||
box-shadow: 0 10px 30px rgba(0, 0, 0, 0.2);
|
||||
min-height: 500px;
|
||||
}
|
||||
|
||||
/* 功能卡片网格 */
|
||||
.feature-cards {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fit, minmax(280px, 1fr));
|
||||
gap: 25px;
|
||||
margin-bottom: 30px;
|
||||
}
|
||||
|
||||
.card {
|
||||
background: linear-gradient(135deg, #f5f7fa 0%, #c3cfe2 100%);
|
||||
border-radius: 12px;
|
||||
padding: 25px;
|
||||
text-align: center;
|
||||
transition: all 0.3s ease;
|
||||
border: 2px solid transparent;
|
||||
cursor: pointer;
|
||||
}
|
||||
|
||||
.card:hover {
|
||||
transform: translateY(-5px);
|
||||
box-shadow: 0 15px 35px rgba(0, 0, 0, 0.1);
|
||||
border-color: #667eea;
|
||||
}
|
||||
|
||||
.card-icon {
|
||||
font-size: 3em;
|
||||
margin-bottom: 15px;
|
||||
}
|
||||
|
||||
.card h3 {
|
||||
font-size: 1.3em;
|
||||
margin-bottom: 10px;
|
||||
color: #2c3e50;
|
||||
}
|
||||
|
||||
.card p {
|
||||
color: #7f8c8d;
|
||||
margin-bottom: 20px;
|
||||
}
|
||||
|
||||
/* 按钮样式 */
|
||||
.btn {
|
||||
padding: 12px 24px;
|
||||
border: none;
|
||||
border-radius: 8px;
|
||||
cursor: pointer;
|
||||
font-size: 1em;
|
||||
font-weight: 600;
|
||||
transition: all 0.3s ease;
|
||||
text-decoration: none;
|
||||
display: inline-block;
|
||||
min-width: 100px;
|
||||
}
|
||||
|
||||
.btn-primary {
|
||||
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
||||
color: white;
|
||||
}
|
||||
|
||||
.btn-primary:hover {
|
||||
transform: translateY(-2px);
|
||||
box-shadow: 0 8px 25px rgba(102, 126, 234, 0.4);
|
||||
}
|
||||
|
||||
.btn-success {
|
||||
background: linear-gradient(135deg, #56ab2f 0%, #a8e6cf 100%);
|
||||
color: white;
|
||||
}
|
||||
|
||||
.btn-success:hover {
|
||||
transform: translateY(-2px);
|
||||
box-shadow: 0 8px 25px rgba(86, 171, 47, 0.4);
|
||||
}
|
||||
|
||||
.btn-info {
|
||||
background: linear-gradient(135deg, #3498db 0%, #74b9ff 100%);
|
||||
color: white;
|
||||
}
|
||||
|
||||
.btn-info:hover {
|
||||
transform: translateY(-2px);
|
||||
box-shadow: 0 8px 25px rgba(52, 152, 219, 0.4);
|
||||
}
|
||||
|
||||
.btn-warning {
|
||||
background: linear-gradient(135deg, #f39c12 0%, #fdcb6e 100%);
|
||||
color: white;
|
||||
}
|
||||
|
||||
.btn-warning:hover {
|
||||
transform: translateY(-2px);
|
||||
box-shadow: 0 8px 25px rgba(243, 156, 18, 0.4);
|
||||
}
|
||||
|
||||
.btn-secondary {
|
||||
background: linear-gradient(135deg, #95a5a6 0%, #bdc3c7 100%);
|
||||
color: white;
|
||||
}
|
||||
|
||||
.btn-secondary:hover {
|
||||
transform: translateY(-2px);
|
||||
box-shadow: 0 8px 25px rgba(149, 165, 166, 0.4);
|
||||
}
|
||||
|
||||
/* 区域样式 */
|
||||
.section {
|
||||
animation: fadeIn 0.5s ease-in;
|
||||
}
|
||||
|
||||
@keyframes fadeIn {
|
||||
from { opacity: 0; transform: translateY(20px); }
|
||||
to { opacity: 1; transform: translateY(0); }
|
||||
}
|
||||
|
||||
.section-header {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
margin-bottom: 30px;
|
||||
padding-bottom: 15px;
|
||||
border-bottom: 2px solid #ecf0f1;
|
||||
}
|
||||
|
||||
.section-header h2 {
|
||||
color: #2c3e50;
|
||||
font-size: 1.8em;
|
||||
}
|
||||
|
||||
/* 表单样式 */
|
||||
.form-group {
|
||||
margin-bottom: 25px;
|
||||
}
|
||||
|
||||
.form-group label {
|
||||
display: block;
|
||||
margin-bottom: 8px;
|
||||
font-weight: 600;
|
||||
color: #2c3e50;
|
||||
}
|
||||
|
||||
.form-group input,
|
||||
.form-group textarea,
|
||||
.form-group select {
|
||||
width: 100%;
|
||||
padding: 12px 15px;
|
||||
border: 2px solid #ddd;
|
||||
border-radius: 8px;
|
||||
font-size: 1em;
|
||||
transition: border-color 0.3s ease;
|
||||
font-family: inherit;
|
||||
}
|
||||
|
||||
.form-group input:focus,
|
||||
.form-group textarea:focus,
|
||||
.form-group select:focus {
|
||||
outline: none;
|
||||
border-color: #667eea;
|
||||
box-shadow: 0 0 0 3px rgba(102, 126, 234, 0.1);
|
||||
}
|
||||
|
||||
.form-group small {
|
||||
display: block;
|
||||
margin-top: 5px;
|
||||
color: #7f8c8d;
|
||||
font-size: 0.9em;
|
||||
}
|
||||
|
||||
.checkbox-label {
|
||||
display: inline-flex !important;
|
||||
align-items: center;
|
||||
margin-right: 20px;
|
||||
margin-bottom: 10px;
|
||||
font-weight: normal !important;
|
||||
}
|
||||
|
||||
.checkbox-label input[type="checkbox"] {
|
||||
width: auto !important;
|
||||
margin-right: 8px;
|
||||
}
|
||||
|
||||
.form-actions {
|
||||
display: flex;
|
||||
gap: 15px;
|
||||
flex-wrap: wrap;
|
||||
margin-top: 25px;
|
||||
}
|
||||
|
||||
/* 结果显示区域 */
|
||||
.result {
|
||||
margin-top: 25px;
|
||||
padding: 20px;
|
||||
border-radius: 8px;
|
||||
display: none;
|
||||
}
|
||||
|
||||
.result.success {
|
||||
background-color: #d4edda;
|
||||
color: #155724;
|
||||
border: 1px solid #c3e6cb;
|
||||
}
|
||||
|
||||
.result.error {
|
||||
background-color: #f8d7da;
|
||||
color: #721c24;
|
||||
border: 1px solid #f5c6cb;
|
||||
}
|
||||
|
||||
.result.info {
|
||||
background-color: #d1ecf1;
|
||||
color: #0c5460;
|
||||
border: 1px solid #bee5eb;
|
||||
}
|
||||
|
||||
.result.loading {
|
||||
background-color: #fff3cd;
|
||||
color: #856404;
|
||||
border: 1px solid #ffeaa7;
|
||||
}
|
||||
|
||||
/* 数据列表样式 */
|
||||
.data-list {
|
||||
margin-top: 25px;
|
||||
}
|
||||
|
||||
.data-item {
|
||||
background: #f8f9fa;
|
||||
border: 1px solid #dee2e6;
|
||||
border-radius: 8px;
|
||||
padding: 20px;
|
||||
margin-bottom: 15px;
|
||||
transition: all 0.3s ease;
|
||||
}
|
||||
|
||||
.data-item:hover {
|
||||
transform: translateX(5px);
|
||||
box-shadow: 0 5px 15px rgba(0, 0, 0, 0.1);
|
||||
}
|
||||
|
||||
.data-item h4 {
|
||||
color: #2c3e50;
|
||||
margin-bottom: 10px;
|
||||
font-size: 1.2em;
|
||||
}
|
||||
|
||||
.data-item-info {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
flex-wrap: wrap;
|
||||
gap: 10px;
|
||||
}
|
||||
|
||||
.data-item-stats {
|
||||
display: flex;
|
||||
gap: 20px;
|
||||
align-items: center;
|
||||
}
|
||||
|
||||
.stat-item {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 5px;
|
||||
font-size: 0.9em;
|
||||
color: #7f8c8d;
|
||||
}
|
||||
|
||||
.data-item-actions {
|
||||
display: flex;
|
||||
gap: 10px;
|
||||
}
|
||||
|
||||
/* 进度条样式 */
|
||||
.progress-container {
|
||||
margin-top: 20px;
|
||||
}
|
||||
|
||||
.progress-bar {
|
||||
width: 100%;
|
||||
height: 8px;
|
||||
background-color: #ecf0f1;
|
||||
border-radius: 4px;
|
||||
overflow: hidden;
|
||||
}
|
||||
|
||||
.progress-fill {
|
||||
height: 100%;
|
||||
background: linear-gradient(90deg, #667eea, #764ba2);
|
||||
width: 0%;
|
||||
transition: width 0.3s ease;
|
||||
animation: pulse 1.5s infinite;
|
||||
}
|
||||
|
||||
@keyframes pulse {
|
||||
0%, 100% { opacity: 1; }
|
||||
50% { opacity: 0.7; }
|
||||
}
|
||||
|
||||
.progress-text {
|
||||
margin-top: 10px;
|
||||
text-align: center;
|
||||
font-size: 0.9em;
|
||||
color: #7f8c8d;
|
||||
}
|
||||
|
||||
/* 状态指示器 */
|
||||
.status-indicator {
|
||||
display: inline-block;
|
||||
width: 10px;
|
||||
height: 10px;
|
||||
border-radius: 50%;
|
||||
margin-right: 8px;
|
||||
}
|
||||
|
||||
.status-success { background-color: #27ae60; }
|
||||
.status-error { background-color: #e74c3c; }
|
||||
.status-warning { background-color: #f39c12; }
|
||||
.status-info { background-color: #3498db; }
|
||||
|
||||
/* 底部样式 */
|
||||
.footer {
|
||||
text-align: center;
|
||||
margin-top: 30px;
|
||||
padding: 20px;
|
||||
color: white;
|
||||
opacity: 0.8;
|
||||
}
|
||||
|
||||
/* 工具提示 */
|
||||
.tooltip {
|
||||
position: relative;
|
||||
cursor: help;
|
||||
}
|
||||
|
||||
.tooltip:hover::after {
|
||||
content: attr(data-tooltip);
|
||||
position: absolute;
|
||||
bottom: 100%;
|
||||
left: 50%;
|
||||
transform: translateX(-50%);
|
||||
background: #2c3e50;
|
||||
color: white;
|
||||
padding: 8px 12px;
|
||||
border-radius: 4px;
|
||||
font-size: 0.9em;
|
||||
white-space: nowrap;
|
||||
z-index: 1000;
|
||||
}
|
||||
|
||||
/* 响应式设计 */
|
||||
@media (max-width: 768px) {
|
||||
.container {
|
||||
padding: 10px;
|
||||
}
|
||||
|
||||
.header h1 {
|
||||
font-size: 2em;
|
||||
}
|
||||
|
||||
.main-content {
|
||||
padding: 20px;
|
||||
}
|
||||
|
||||
.feature-cards {
|
||||
grid-template-columns: 1fr;
|
||||
gap: 15px;
|
||||
}
|
||||
|
||||
.section-header {
|
||||
flex-direction: column;
|
||||
align-items: flex-start;
|
||||
gap: 15px;
|
||||
}
|
||||
|
||||
.form-actions {
|
||||
flex-direction: column;
|
||||
}
|
||||
|
||||
.data-item-info {
|
||||
flex-direction: column;
|
||||
align-items: flex-start;
|
||||
}
|
||||
|
||||
.data-item-actions {
|
||||
width: 100%;
|
||||
justify-content: flex-start;
|
||||
}
|
||||
}
|
||||
|
||||
@media (max-width: 480px) {
|
||||
.header h1 {
|
||||
font-size: 1.8em;
|
||||
}
|
||||
|
||||
.card {
|
||||
padding: 20px;
|
||||
}
|
||||
|
||||
.btn {
|
||||
padding: 10px 20px;
|
||||
font-size: 0.9em;
|
||||
}
|
||||
}
|
||||
|
||||
/* 加载动画 */
|
||||
.loading-spinner {
|
||||
display: inline-block;
|
||||
width: 20px;
|
||||
height: 20px;
|
||||
border: 2px solid #f3f3f3;
|
||||
border-top: 2px solid #667eea;
|
||||
border-radius: 50%;
|
||||
animation: spin 1s linear infinite;
|
||||
margin-right: 10px;
|
||||
}
|
||||
|
||||
@keyframes spin {
|
||||
0% { transform: rotate(0deg); }
|
||||
100% { transform: rotate(360deg); }
|
||||
}
|
||||
|
||||
/* 文本对齐工具类 */
|
||||
.text-center { text-align: center; }
|
||||
.text-left { text-align: left; }
|
||||
.text-right { text-align: right; }
|
||||
|
||||
/* 间距工具类 */
|
||||
.mt-10 { margin-top: 10px; }
|
||||
.mt-20 { margin-top: 20px; }
|
||||
.mb-10 { margin-bottom: 10px; }
|
||||
.mb-20 { margin-bottom: 20px; }
|
||||
|
||||
/* 显示/隐藏工具类 */
|
||||
.hidden { display: none !important; }
|
||||
.visible { display: block !important; }
|
||||
167
frontend/index.html
Normal file
167
frontend/index.html
Normal file
@@ -0,0 +1,167 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="zh-CN">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>微信公众号文章爬虫系统</title>
|
||||
<link rel="stylesheet" href="css/style.css">
|
||||
</head>
|
||||
<body>
|
||||
<div class="container">
|
||||
<header class="header">
|
||||
<h1>🚀 微信公众号文章爬虫系统</h1>
|
||||
<p class="subtitle">Wechat Official Account Article Crawler</p>
|
||||
</header>
|
||||
|
||||
<div class="main-content">
|
||||
<!-- 功能选择卡片 -->
|
||||
<div class="feature-cards">
|
||||
<div class="card" id="card-homepage">
|
||||
<div class="card-icon">🏠</div>
|
||||
<h3>提取公众号主页</h3>
|
||||
<p>输入文章链接获取公众号主页链接</p>
|
||||
<button class="btn btn-primary" onclick="showSection('homepage')">进入</button>
|
||||
</div>
|
||||
|
||||
<div class="card" id="card-single">
|
||||
<div class="card-icon">📄</div>
|
||||
<h3>下载单篇文章</h3>
|
||||
<p>根据链接下载单篇文章</p>
|
||||
<button class="btn btn-primary" onclick="showSection('single')">进入</button>
|
||||
</div>
|
||||
|
||||
<div class="card" id="card-list">
|
||||
<div class="card-icon">📋</div>
|
||||
<h3>获取文章列表</h3>
|
||||
<p>获取公众号所有文章列表</p>
|
||||
<button class="btn btn-primary" onclick="showSection('list')">进入</button>
|
||||
</div>
|
||||
|
||||
<div class="card" id="card-batch">
|
||||
<div class="card-icon">📦</div>
|
||||
<h3>批量下载文章</h3>
|
||||
<p>批量下载文章详细内容</p>
|
||||
<button class="btn btn-primary" onclick="showSection('batch')">进入</button>
|
||||
</div>
|
||||
|
||||
<div class="card" id="card-data">
|
||||
<div class="card-icon">📊</div>
|
||||
<h3>数据管理</h3>
|
||||
<p>查看已下载的文章数据</p>
|
||||
<button class="btn btn-primary" onclick="showSection('data')">进入</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- 提取公众号主页区域 -->
|
||||
<div class="section" id="section-homepage" style="display:none;">
|
||||
<div class="section-header">
|
||||
<h2>🏠 提取公众号主页</h2>
|
||||
<button class="btn btn-secondary" onclick="showSection('home')">返回</button>
|
||||
</div>
|
||||
<div class="form-group">
|
||||
<label>公众号文章链接:</label>
|
||||
<input type="text" id="homepage-url" placeholder="请输入公众号下任意一篇已发布的文章链接...">
|
||||
<small>支持公众号文章完整URL,无需Cookie即可获取公众号主页链接</small>
|
||||
</div>
|
||||
<div class="form-actions">
|
||||
<button class="btn btn-success" onclick="extractHomepage()">提取主页链接</button>
|
||||
<button class="btn btn-info" onclick="loadExampleUrl()">查看示例</button>
|
||||
</div>
|
||||
<div class="result" id="homepage-result"></div>
|
||||
</div>
|
||||
|
||||
<!-- 下载单篇文章区域 -->
|
||||
<div class="section" id="section-single" style="display:none;">
|
||||
<div class="section-header">
|
||||
<h2>📄 下载单篇文章</h2>
|
||||
<button class="btn btn-secondary" onclick="showSection('home')">返回</button>
|
||||
</div>
|
||||
<div class="form-group">
|
||||
<label>文章链接:</label>
|
||||
<input type="text" id="article-url" placeholder="请输入微信文章链接...">
|
||||
</div>
|
||||
<div class="form-group">
|
||||
<label class="checkbox-label">
|
||||
<input type="checkbox" id="save-image" checked> 保存图片
|
||||
</label>
|
||||
<label class="checkbox-label">
|
||||
<input type="checkbox" id="save-content" checked> 保存内容
|
||||
</label>
|
||||
</div>
|
||||
<div class="form-actions">
|
||||
<button class="btn btn-success" onclick="downloadSingleArticle()">开始下载</button>
|
||||
</div>
|
||||
<div class="result" id="single-result"></div>
|
||||
</div>
|
||||
|
||||
<!-- 获取文章列表区域 -->
|
||||
<div class="section" id="section-list" style="display:none;">
|
||||
<div class="section-header">
|
||||
<h2>📋 获取文章列表</h2>
|
||||
<button class="btn btn-secondary" onclick="showSection('home')">返回</button>
|
||||
</div>
|
||||
<div class="form-group">
|
||||
<label>Access Token URL:</label>
|
||||
<textarea id="access-token" placeholder="请粘贴从Fiddler获取的完整URL..." rows="4"></textarea>
|
||||
<small>包含 __biz, uin, key, pass_ticket 等参数的完整URL</small>
|
||||
</div>
|
||||
<div class="form-group">
|
||||
<label>获取页数:</label>
|
||||
<input type="number" id="pages" value="1" min="1" max="999">
|
||||
<small>留空表示获取全部</small>
|
||||
</div>
|
||||
<div class="form-actions">
|
||||
<button class="btn btn-success" onclick="getArticleList()">开始获取</button>
|
||||
</div>
|
||||
<div class="result" id="list-result"></div>
|
||||
</div>
|
||||
|
||||
<!-- 批量下载区域 -->
|
||||
<div class="section" id="section-batch" style="display:none;">
|
||||
<div class="section-header">
|
||||
<h2>📦 批量下载文章</h2>
|
||||
<button class="btn btn-secondary" onclick="showSection('home')">返回</button>
|
||||
</div>
|
||||
<div class="form-group">
|
||||
<label>公众号名称或文章链接:</label>
|
||||
<input type="text" id="official-account" placeholder="请输入公众号名称或任意一篇文章链接...">
|
||||
</div>
|
||||
<div class="form-group">
|
||||
<label class="checkbox-label">
|
||||
<input type="checkbox" id="batch-save-image"> 保存图片
|
||||
</label>
|
||||
<label class="checkbox-label">
|
||||
<input type="checkbox" id="batch-save-content" checked> 保存内容
|
||||
</label>
|
||||
</div>
|
||||
<div class="form-actions">
|
||||
<button class="btn btn-success" onclick="batchDownload()">开始批量下载</button>
|
||||
</div>
|
||||
<div class="result" id="batch-result"></div>
|
||||
</div>
|
||||
|
||||
<!-- 数据管理区域 -->
|
||||
<div class="section" id="section-data" style="display:none;">
|
||||
<div class="section-header">
|
||||
<h2>📊 数据管理</h2>
|
||||
<button class="btn btn-secondary" onclick="showSection('home')">返回</button>
|
||||
</div>
|
||||
<div class="form-actions">
|
||||
<button class="btn btn-info" onclick="loadDataList()">刷新列表</button>
|
||||
<button class="btn btn-warning" onclick="openDataFolder()">打开数据文件夹</button>
|
||||
</div>
|
||||
<div class="data-list" id="data-list">
|
||||
<p class="text-center">点击"刷新列表"加载数据...</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<footer class="footer">
|
||||
<p>© 2025 微信公众号文章爬虫系统 | 仅供学习使用</p>
|
||||
</footer>
|
||||
</div>
|
||||
|
||||
<script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>
|
||||
<script src="js/app.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
509
frontend/js/app.js
Normal file
509
frontend/js/app.js
Normal file
@@ -0,0 +1,509 @@
|
||||
// 全局变量
|
||||
let isTaskRunning = false;
|
||||
let currentSection = 'home';
|
||||
let taskCheckInterval;
|
||||
const API_BASE_URL = 'http://localhost:8080/api'; // API基础地址
|
||||
|
||||
// DOM加载完成后初始化
|
||||
$(document).ready(function() {
|
||||
showSection('home');
|
||||
console.log('✅ 微信公众号文章爬虫系统已加载');
|
||||
});
|
||||
|
||||
// 显示指定区域
|
||||
function showSection(sectionName) {
|
||||
// 隐藏所有区域
|
||||
$('.section').hide();
|
||||
$('.feature-cards').hide();
|
||||
|
||||
if (sectionName === 'home') {
|
||||
$('.feature-cards').show();
|
||||
currentSection = 'home';
|
||||
} else {
|
||||
$('#section-' + sectionName).show();
|
||||
currentSection = sectionName;
|
||||
}
|
||||
}
|
||||
|
||||
// 提取公众号主页相关函数
|
||||
function extractHomepage() {
|
||||
const articleUrl = $('#homepage-url').val().trim();
|
||||
|
||||
if (!articleUrl) {
|
||||
showResult('homepage', 'error', '请输入文章链接');
|
||||
return;
|
||||
}
|
||||
|
||||
if (!articleUrl.includes('mp.weixin.qq.com')) {
|
||||
showResult('homepage', 'error', '请输入有效的微信公众号文章链接');
|
||||
return;
|
||||
}
|
||||
|
||||
if (isTaskRunning) {
|
||||
showResult('homepage', 'error', '有任务正在执行,请稍候...');
|
||||
return;
|
||||
}
|
||||
|
||||
isTaskRunning = true;
|
||||
showResult('homepage', 'loading', '正在提取公众号主页链接...');
|
||||
|
||||
// 调用后端API
|
||||
$.ajax({
|
||||
url: `${API_BASE_URL}/homepage/extract`,
|
||||
method: 'POST',
|
||||
contentType: 'application/json',
|
||||
data: JSON.stringify({ url: articleUrl }),
|
||||
success: function(response) {
|
||||
isTaskRunning = false;
|
||||
if (response.success && response.data && response.data.homepage) {
|
||||
const homepageUrl = response.data.homepage;
|
||||
const safeUrl = homepageUrl.replace(/'/g, "\\'");
|
||||
const resultHtml = `
|
||||
<div style="background: #f8f9fa; padding: 15px; border-radius: 8px; margin-top: 10px;">
|
||||
<h4 style="color: #28a745; margin-bottom: 10px;">✅ 提取成功</h4>
|
||||
<p><strong>公众号主页链接:</strong></p>
|
||||
<div style="background: white; padding: 10px; border: 1px solid #ddd; border-radius: 4px; word-break: break-all; font-family: monospace; font-size: 0.9em;">
|
||||
${homepageUrl}
|
||||
</div>
|
||||
<div style="margin-top: 15px;">
|
||||
<button class="btn btn-info" onclick="copyToClipboard('${safeUrl}')" style="margin-right: 10px;">📋 复制链接</button>
|
||||
<button class="btn btn-warning" onclick="openInNewTab('${safeUrl}')">🔗 打开主页</button>
|
||||
</div>
|
||||
</div>
|
||||
`;
|
||||
showResult('homepage', 'success', resultHtml);
|
||||
} else {
|
||||
showResult('homepage', 'error', response.message || '提取失败');
|
||||
}
|
||||
},
|
||||
error: function(xhr, status, error) {
|
||||
isTaskRunning = false;
|
||||
let errorMsg = '请求失败:' + error;
|
||||
if (xhr.status === 0) {
|
||||
errorMsg = '无法连接到后端服务器,请确保 API 服务器已启动(运行 api_server.exe)';
|
||||
}
|
||||
showResult('homepage', 'error', errorMsg);
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
// 生成模拟的公众号主页链接
|
||||
function generateMockHomepageUrl(articleUrl) {
|
||||
// 从文章链接中提取__biz参数来模拟真实的主页链接
|
||||
const bizMatch = articleUrl.match(/__biz=([^&]+)/);
|
||||
if (bizMatch) {
|
||||
const biz = bizMatch[1];
|
||||
return `https://mp.weixin.qq.com/mp/profile_ext?action=home&__biz=${biz}&scene=124`;
|
||||
}
|
||||
// 如果无法提取,返回示例链接
|
||||
return 'https://mp.weixin.qq.com/mp/profile_ext?action=home&__biz=MzI1NjEwMTM4OA==&scene=124';
|
||||
}
|
||||
|
||||
function loadExampleUrl() {
|
||||
const exampleUrl = 'https://mp.weixin.qq.com/s?__biz=MzI1NjEwMTM4OA==&mid=2651232405&idx=1&sn=7c8f5b2e3d4a1b9c8e7f6a5b4c3d2e1f&chksm=f1d7e8c4c6a061d2b9e8f7a6b5c4d3e2f1a0b9c8d7e6f5a4b3c2d1e0f9a8b7c6d5e4f3a2b1c0&scene=27';
|
||||
|
||||
$('#homepage-url').val(exampleUrl);
|
||||
showResult('homepage', 'info', '已加载文章链接示例,点击"提取主页链接"开始处理');
|
||||
}
|
||||
|
||||
// 打开链接的辅助函数
|
||||
function openInNewTab(url) {
|
||||
window.open(url, '_blank');
|
||||
}
|
||||
|
||||
// 下载单篇文章
|
||||
function downloadSingleArticle() {
|
||||
alert('此功能需要后端命令行支持。\n\n请使用命令行程序:\n1. 运行 wechat-crawler.exe\n2. 选择对应功能进行下载');
|
||||
showResult('single', 'info', '请使用命令行程序执行下载功能');
|
||||
}
|
||||
|
||||
// 获取文章列表
|
||||
function getArticleList() {
|
||||
const accessToken = $('#access-token').val().trim();
|
||||
const pages = parseInt($('#pages').val()) || 0;
|
||||
|
||||
if (!accessToken) {
|
||||
showResult('list', 'error', '请输入Access Token URL');
|
||||
return;
|
||||
}
|
||||
|
||||
if (isTaskRunning) {
|
||||
showResult('list', 'error', '有任务正在执行,请稍候...');
|
||||
return;
|
||||
}
|
||||
|
||||
isTaskRunning = true;
|
||||
showResult('list', 'loading', '正在获取文章列表,请稍候...');
|
||||
|
||||
// 调用后端API(同步等待)
|
||||
$.ajax({
|
||||
url: `${API_BASE_URL}/article/list`,
|
||||
method: 'POST',
|
||||
contentType: 'application/json',
|
||||
data: JSON.stringify({ access_token: accessToken, pages: pages }),
|
||||
success: function(response) {
|
||||
isTaskRunning = false;
|
||||
if (response.success && response.data) {
|
||||
const data = response.data;
|
||||
const fileExt = data.filename.endsWith('.txt') ? 'TXT文件' : 'Excel文件';
|
||||
const resultHtml = `
|
||||
<div style="background: #f8f9fa; padding: 15px; border-radius: 8px; margin-top: 10px;">
|
||||
<h4 style="color: #28a745; margin-bottom: 10px;">✅ 获取成功</h4>
|
||||
<p><strong>公众号:</strong>${data.account}</p>
|
||||
<p><strong>文件:</strong>${data.filename}</p>
|
||||
<div style="margin-top: 15px;">
|
||||
<a href="${API_BASE_URL}${data.download}" class="btn btn-success" download>📥 下载${fileExt}</a>
|
||||
</div>
|
||||
</div>
|
||||
`;
|
||||
showResult('list', 'success', resultHtml);
|
||||
|
||||
// 自动触发下载
|
||||
window.location.href = `${API_BASE_URL}${data.download}`;
|
||||
} else {
|
||||
showResult('list', 'error', response.message || '获取失败');
|
||||
}
|
||||
},
|
||||
error: function(xhr, status, error) {
|
||||
isTaskRunning = false;
|
||||
let errorMsg = '请求失败:' + error;
|
||||
if (xhr.status === 0) {
|
||||
errorMsg = '无法连接到后端服务器,请确保 API 服务器已启动';
|
||||
} else if (xhr.responseJSON && xhr.responseJSON.message) {
|
||||
errorMsg = xhr.responseJSON.message;
|
||||
}
|
||||
showResult('list', 'error', errorMsg);
|
||||
},
|
||||
timeout: 120000 // 2分钟超时
|
||||
});
|
||||
}
|
||||
|
||||
// 批量下载文章
|
||||
function batchDownload() {
|
||||
const officialAccount = $('#official-account').val().trim();
|
||||
const saveImage = $('#batch-save-image').is(':checked');
|
||||
const saveContent = $('#batch-save-content').is(':checked');
|
||||
|
||||
if (!officialAccount) {
|
||||
showResult('batch', 'error', '请输入公众号名称或文章链接');
|
||||
return;
|
||||
}
|
||||
|
||||
if (isTaskRunning) {
|
||||
showResult('batch', 'error', '有任务正在执行,请稍候...');
|
||||
return;
|
||||
}
|
||||
|
||||
isTaskRunning = true;
|
||||
showResult('batch', 'loading', '正在批量下载文章,请稍候...');
|
||||
|
||||
// 调用后端API(同步等待)
|
||||
$.ajax({
|
||||
url: `${API_BASE_URL}/article/batch`,
|
||||
method: 'POST',
|
||||
contentType: 'application/json',
|
||||
data: JSON.stringify({
|
||||
official_account: officialAccount,
|
||||
save_image: saveImage,
|
||||
save_content: saveContent
|
||||
}),
|
||||
success: function(response) {
|
||||
isTaskRunning = false;
|
||||
if (response.success && response.data) {
|
||||
const data = response.data;
|
||||
const resultHtml = `
|
||||
<div style="background: #f8f9fa; padding: 15px; border-radius: 8px; margin-top: 10px;">
|
||||
<h4 style="color: #28a745; margin-bottom: 10px;">✅ ${response.message}</h4>
|
||||
<p><strong>公众号:</strong>${data.account}</p>
|
||||
<p><strong>文章数量:</strong>${data.articleCount} 篇</p>
|
||||
<p><strong>保存路径:</strong>${data.path}</p>
|
||||
<div style="margin-top: 15px;">
|
||||
<button class="btn btn-info" onclick="loadDataList()">📊 查看数据列表</button>
|
||||
</div>
|
||||
</div>
|
||||
`;
|
||||
showResult('batch', 'success', resultHtml);
|
||||
} else {
|
||||
showResult('batch', 'error', response.message || '批量下载失败');
|
||||
}
|
||||
},
|
||||
error: function(xhr, status, error) {
|
||||
isTaskRunning = false;
|
||||
let errorMsg = '请求失败:' + error;
|
||||
if (xhr.status === 0) {
|
||||
errorMsg = '无法连接到后端服务器,请确保 API 服务器已启动';
|
||||
} else if (xhr.responseJSON && xhr.responseJSON.message) {
|
||||
errorMsg = xhr.responseJSON.message;
|
||||
}
|
||||
showResult('batch', 'error', errorMsg);
|
||||
},
|
||||
timeout: 300000 // 5分钟超时
|
||||
});
|
||||
}
|
||||
|
||||
// 加载数据列表
|
||||
function loadDataList() {
|
||||
showResult('data', 'loading', '正在加载数据列表...');
|
||||
|
||||
// 调用后端API
|
||||
$.ajax({
|
||||
url: `${API_BASE_URL}/data/list`,
|
||||
method: 'GET',
|
||||
success: function(response) {
|
||||
if (response.success && response.data) {
|
||||
displayDataList(response.data);
|
||||
hideResult('data');
|
||||
} else {
|
||||
showResult('data', 'error', '加载失败');
|
||||
}
|
||||
},
|
||||
error: function(xhr, status, error) {
|
||||
let errorMsg = '请求失败:' + error;
|
||||
if (xhr.status === 0) {
|
||||
errorMsg = '无法连接到后端服务器,请确保 API 服务器已启动';
|
||||
}
|
||||
showResult('data', 'error', errorMsg);
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
// 显示数据列表
|
||||
function displayDataList(dataList) {
|
||||
let html = '';
|
||||
|
||||
if (!dataList || dataList.length === 0) {
|
||||
html = '<p class="text-center" style="padding: 20px; color: #666;">暂无数据,请先使用其他功能爬取文章</p>';
|
||||
} else {
|
||||
dataList.forEach(item => {
|
||||
const safeItemName = (item.name || '').replace(/'/g, "\\'");
|
||||
const safeItemPath = (item.path || '').replace(/'/g, "\\'").replace(/\\/g, '\\\\');
|
||||
html += `
|
||||
<div class="data-item">
|
||||
<h4><span class="status-indicator status-success"></span>${item.name || '未知'}</h4>
|
||||
<div class="data-item-info">
|
||||
<div class="data-item-stats">
|
||||
<div class="stat-item">
|
||||
<span>📊</span>
|
||||
<span>${item.articleCount || 0} 篇文章</span>
|
||||
</div>
|
||||
<div class="stat-item">
|
||||
<span>📅</span>
|
||||
<span>${item.lastUpdate || '未知'}</span>
|
||||
</div>
|
||||
<div class="stat-item">
|
||||
<span>📁</span>
|
||||
<span>${item.path || '未知'}</span>
|
||||
</div>
|
||||
</div>
|
||||
<div class="data-item-actions">
|
||||
<button class="btn btn-info" onclick="viewArticles('${safeItemName}')">查看文章</button>
|
||||
<button class="btn btn-warning" onclick="alert('文件夹路径:${safeItemPath}')">查看路径</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
`;
|
||||
});
|
||||
}
|
||||
|
||||
$('#data-list').html(html);
|
||||
}
|
||||
|
||||
// 查看文章列表
|
||||
function viewArticles(accountName) {
|
||||
alert(`查看 ${accountName} 的文章列表
|
||||
|
||||
这里将展示该公众号的所有文章,包括:
|
||||
- 文章标题
|
||||
- 发布时间
|
||||
- 文件大小
|
||||
- 下载状态等`);
|
||||
}
|
||||
|
||||
// 打开文件夹
|
||||
function openFolder(path) {
|
||||
alert(`打开文件夹: ${path}\n\n在实际环境中,这里会调用系统命令打开文件资源管理器。`);
|
||||
}
|
||||
|
||||
// 打开数据文件夹
|
||||
function openDataFolder() {
|
||||
alert('打开数据文件夹\n\n在实际环境中,这里会打开data目录。');
|
||||
}
|
||||
|
||||
// 任务管理函数
|
||||
function startTask(section, message) {
|
||||
isTaskRunning = true;
|
||||
showResult(section, 'loading', message);
|
||||
|
||||
// 显示进度条
|
||||
const resultDiv = $(`#${section}-result`);
|
||||
resultDiv.append(`
|
||||
<div class="progress-container">
|
||||
<div class="progress-bar">
|
||||
<div class="progress-fill"></div>
|
||||
</div>
|
||||
<div class="progress-text">0%</div>
|
||||
</div>
|
||||
`);
|
||||
|
||||
// 禁用相关按钮
|
||||
disableButtons();
|
||||
}
|
||||
|
||||
function updateTaskProgress(percent, message) {
|
||||
const progressFill = $('.progress-fill');
|
||||
const progressText = $('.progress-text');
|
||||
|
||||
progressFill.css('width', percent + '%');
|
||||
progressText.text(Math.floor(percent) + '% - ' + message);
|
||||
}
|
||||
|
||||
function endTask(section, type, message) {
|
||||
isTaskRunning = false;
|
||||
|
||||
// 移除进度条
|
||||
$('.progress-container').remove();
|
||||
|
||||
showResult(section, type, message);
|
||||
enableButtons();
|
||||
}
|
||||
|
||||
function disableButtons() {
|
||||
$('.btn').prop('disabled', true).addClass('disabled');
|
||||
}
|
||||
|
||||
function enableButtons() {
|
||||
$('.btn').prop('disabled', false).removeClass('disabled');
|
||||
}
|
||||
|
||||
// 结果显示函数
|
||||
function showResult(section, type, message) {
|
||||
const resultDiv = $(`#${section}-result`);
|
||||
resultDiv.removeClass('success error info loading')
|
||||
.addClass(type)
|
||||
.html(getResultIcon(type) + message)
|
||||
.show();
|
||||
|
||||
// 自动滚动到结果区域
|
||||
resultDiv[0].scrollIntoView({ behavior: 'smooth' });
|
||||
}
|
||||
|
||||
function hideResult(section) {
|
||||
$(`#${section}-result`).hide();
|
||||
}
|
||||
|
||||
function getResultIcon(type) {
|
||||
switch (type) {
|
||||
case 'success': return '<span class="loading-spinner" style="display:none;"></span>✅ ';
|
||||
case 'error': return '<span class="loading-spinner" style="display:none;"></span>❌ ';
|
||||
case 'info': return '<span class="loading-spinner" style="display:none;"></span>ℹ️ ';
|
||||
case 'loading': return '<span class="loading-spinner"></span>';
|
||||
default: return '';
|
||||
}
|
||||
}
|
||||
|
||||
// 表单验证函数
|
||||
function validateUrl(url) {
|
||||
try {
|
||||
new URL(url);
|
||||
return true;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
function validateInput(value, type) {
|
||||
switch (type) {
|
||||
case 'url':
|
||||
return validateUrl(value);
|
||||
case 'notEmpty':
|
||||
return value.trim().length > 0;
|
||||
case 'number':
|
||||
return !isNaN(value) && parseInt(value) > 0;
|
||||
default:
|
||||
return true;
|
||||
}
|
||||
}
|
||||
|
||||
// 工具函数
|
||||
function formatFileSize(bytes) {
|
||||
if (bytes === 0) return '0 Bytes';
|
||||
const k = 1024;
|
||||
const sizes = ['Bytes', 'KB', 'MB', 'GB'];
|
||||
const i = Math.floor(Math.log(bytes) / Math.log(k));
|
||||
return parseFloat((bytes / Math.pow(k, i)).toFixed(2)) + ' ' + sizes[i];
|
||||
}
|
||||
|
||||
function formatDate(dateString) {
|
||||
const date = new Date(dateString);
|
||||
return date.getFullYear() + '-' +
|
||||
String(date.getMonth() + 1).padStart(2, '0') + '-' +
|
||||
String(date.getDate()).padStart(2, '0');
|
||||
}
|
||||
|
||||
function copyToClipboard(text) {
|
||||
navigator.clipboard.writeText(text).then(() => {
|
||||
alert('已复制到剪贴板');
|
||||
}).catch(() => {
|
||||
alert('复制失败,请手动复制');
|
||||
});
|
||||
}
|
||||
|
||||
// 快捷键支持
|
||||
$(document).keydown(function(e) {
|
||||
// ESC键返回首页
|
||||
if (e.keyCode === 27 && currentSection !== 'home') {
|
||||
showSection('home');
|
||||
}
|
||||
|
||||
// Ctrl+Enter 执行当前页面的主要操作
|
||||
if (e.ctrlKey && e.keyCode === 13) {
|
||||
switch (currentSection) {
|
||||
case 'homepage':
|
||||
extractHomepage();
|
||||
break;
|
||||
case 'single':
|
||||
downloadSingleArticle();
|
||||
break;
|
||||
case 'list':
|
||||
getArticleList();
|
||||
break;
|
||||
case 'batch':
|
||||
batchDownload();
|
||||
break;
|
||||
case 'data':
|
||||
loadDataList();
|
||||
break;
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
// 页面可见性变化时的处理
|
||||
document.addEventListener('visibilitychange', function() {
|
||||
if (document.hidden) {
|
||||
console.log('页面已隐藏');
|
||||
} else {
|
||||
console.log('页面已显示');
|
||||
// 可以在这里刷新任务状态
|
||||
}
|
||||
});
|
||||
|
||||
// 错误处理
|
||||
window.onerror = function(message, source, lineno, colno, error) {
|
||||
console.error('页面错误:', message, '位置:', source + ':' + lineno);
|
||||
return false;
|
||||
};
|
||||
|
||||
// 控制台欢迎信息
|
||||
console.log(`
|
||||
🚀 微信公众号文章爬虫系统 Web界面
|
||||
====================================
|
||||
版本: 1.0.0
|
||||
开发者: AI Assistant
|
||||
更新时间: 2025-11-27
|
||||
====================================
|
||||
💡 提示:
|
||||
- 按 ESC 键返回首页
|
||||
- 按 Ctrl+Enter 执行当前操作
|
||||
- 所有操作都会显示详细进度
|
||||
====================================
|
||||
`);
|
||||
88
frontend/start_web.bat
Normal file
88
frontend/start_web.bat
Normal file
@@ -0,0 +1,88 @@
|
||||
@echo off
|
||||
chcp 65001 >nul
|
||||
title 微信公众号文章爬虫系统 - Web界面
|
||||
|
||||
echo.
|
||||
echo ===============================================
|
||||
echo 🚀 微信公众号文章爬虫系统
|
||||
echo Web界面启动
|
||||
echo ===============================================
|
||||
echo.
|
||||
|
||||
:: 检查Python是否安装
|
||||
python --version >nul 2>&1
|
||||
if errorlevel 1 (
|
||||
echo ❌ 未检测到Python,正在尝试其他方法...
|
||||
goto :use_powershell
|
||||
) else (
|
||||
echo ✅ 检测到Python环境
|
||||
goto :use_python
|
||||
)
|
||||
|
||||
:use_python
|
||||
echo 📱 使用Python启动Web服务器...
|
||||
echo 🌐 服务地址: http://localhost:8000
|
||||
echo ⏰ 启动时间: %date% %time%
|
||||
echo.
|
||||
echo 💡 提示: 按 Ctrl+C 停止服务器
|
||||
echo ===============================================
|
||||
echo.
|
||||
|
||||
cd /d "%~dp0"
|
||||
python -m http.server 8000
|
||||
goto :end
|
||||
|
||||
:use_powershell
|
||||
echo 📱 使用PowerShell启动Web服务器...
|
||||
echo 🌐 服务地址: http://localhost:8080
|
||||
echo ⏰ 启动时间: %date% %time%
|
||||
echo.
|
||||
echo 💡 提示: 按 Ctrl+C 停止服务器
|
||||
echo ===============================================
|
||||
echo.
|
||||
|
||||
cd /d "%~dp0"
|
||||
powershell -Command "
|
||||
$listener = New-Object System.Net.HttpListener
|
||||
$listener.Prefixes.Add('http://localhost:8080/')
|
||||
$listener.Start()
|
||||
Write-Host '✅ Web服务器已启动: http://localhost:8080'
|
||||
Write-Host '🌐 正在打开浏览器...'
|
||||
Start-Process 'http://localhost:8080'
|
||||
|
||||
while ($listener.IsListening) {
|
||||
$context = $listener.GetContext()
|
||||
$request = $context.Request
|
||||
$response = $context.Response
|
||||
|
||||
$path = $request.Url.LocalPath
|
||||
if ($path -eq '/') { $path = '/index.html' }
|
||||
|
||||
$filePath = Join-Path (Get-Location) $path.TrimStart('/')
|
||||
|
||||
if (Test-Path $filePath) {
|
||||
$content = [System.IO.File]::ReadAllBytes($filePath)
|
||||
$response.ContentType = switch ([System.IO.Path]::GetExtension($filePath).ToLower()) {
|
||||
'.html' { 'text/html; charset=utf-8' }
|
||||
'.css' { 'text/css; charset=utf-8' }
|
||||
'.js' { 'application/javascript; charset=utf-8' }
|
||||
'.json' { 'application/json; charset=utf-8' }
|
||||
default { 'text/plain; charset=utf-8' }
|
||||
}
|
||||
$response.ContentLength64 = $content.Length
|
||||
$response.OutputStream.Write($content, 0, $content.Length)
|
||||
} else {
|
||||
$response.StatusCode = 404
|
||||
$response.StatusDescription = 'Not Found'
|
||||
}
|
||||
|
||||
$response.OutputStream.Close()
|
||||
}
|
||||
"
|
||||
|
||||
:end
|
||||
echo.
|
||||
echo ===============================================
|
||||
echo 服务器已停止运行
|
||||
echo ===============================================
|
||||
pause
|
||||
1753
frontend/user-center.html
Normal file
1753
frontend/user-center.html
Normal file
File diff suppressed because it is too large
Load Diff
178
frontend/前端功能问题说明.md
Normal file
178
frontend/前端功能问题说明.md
Normal file
@@ -0,0 +1,178 @@
|
||||
# 🔧 前端功能问题说明和解决方案
|
||||
|
||||
## ❌ 当前问题
|
||||
|
||||
前端的所有功能(除了"提取公众号主页")都**无法正常工作**,原因如下:
|
||||
|
||||
### 问题1:前端是纯模拟,未调用真实后端
|
||||
当前前端代码中的所有下载功能都是**模拟执行**:
|
||||
```javascript
|
||||
// 这只是模拟,没有真正下载
|
||||
const progressInterval = setInterval(() => {
|
||||
progress += Math.random() * 20;
|
||||
if (progress >= 100) {
|
||||
endTask('single', 'success', '文章下载完成!'); // 假的成功提示
|
||||
}
|
||||
}, 800);
|
||||
```
|
||||
|
||||
### 问题2:浏览器无法直接执行本地程序
|
||||
Web前端在浏览器中运行,出于安全限制,**无法直接调用本地的exe程序**。
|
||||
|
||||
## ✅ 解决方案
|
||||
|
||||
需要搭建一个**HTTP API服务器**作为桥梁,连接前端和后端程序。
|
||||
|
||||
### 方案架构
|
||||
```
|
||||
前端网页 (浏览器)
|
||||
↓ HTTP请求
|
||||
API服务器 (Go/Node.js)
|
||||
↓ 执行命令
|
||||
后端爬虫程序 (wechat-crawler.exe)
|
||||
```
|
||||
|
||||
## 🚀 实施步骤
|
||||
|
||||
### 步骤1:已创建API服务器代码
|
||||
|
||||
文件:`backend/api/server.go`
|
||||
|
||||
主要功能:
|
||||
- ✅ 提取公众号主页 (`/api/homepage/extract`)
|
||||
- ⏳ 下载单篇文章 (`/api/article/download`)
|
||||
- ⏳ 获取文章列表 (`/api/article/list`)
|
||||
- ⏳ 批量下载 (`/api/article/batch`)
|
||||
- ✅ 获取数据列表 (`/api/data/list`)
|
||||
|
||||
### 步骤2:编译API服务器
|
||||
|
||||
```bash
|
||||
cd d:\workspace\Access_wechat_article\backend\api
|
||||
go build -o api_server.exe server.go
|
||||
```
|
||||
|
||||
### 步骤3:启动API服务器
|
||||
|
||||
```bash
|
||||
cd d:\workspace\Access_wechat_article\backend
|
||||
.\api\api_server.exe
|
||||
```
|
||||
|
||||
服务器将运行在 `http://localhost:8080`
|
||||
|
||||
### 步骤4:修复前端代码
|
||||
|
||||
前端`js/app.js`文件被意外破坏,需要修复第68行的代码错误。
|
||||
|
||||
**问题代码**(第68行):
|
||||
```javascript
|
||||
<button class="btn btn-info" onclick="copyToClipboard('${homepageUrl.replace(/'/g, "\\'")}')"入下载功能中。`
|
||||
```
|
||||
|
||||
**应该是**:
|
||||
```javascript
|
||||
<button class="btn btn-info" onclick="copyToClipboard('${homepageUrl.replace(/'/g, "\\'")}')">📋 复制链接</button>
|
||||
<button class="btn btn-warning" onclick="openInNewTab('${homepageUrl}')">🔗 打开主页</button>
|
||||
```
|
||||
|
||||
## 📋 当前可用功能
|
||||
|
||||
### ✅ 已实现功能
|
||||
1. **提取公众号主页** - 通过API服务器调用后端程序
|
||||
|
||||
### ⏳ 需要完善的功能
|
||||
2. **下载单篇文章** - 需要后端添加对应的命令行接口
|
||||
3. **获取文章列表** - 需要后端添加对应的命令行接口
|
||||
4. **批量下载** - 可使用现有的功能5
|
||||
5. **数据管理** - 已有API,前端需要调用
|
||||
|
||||
## 🔨 完整解决方案
|
||||
|
||||
由于问题比较复杂,建议采用以下简化方案:
|
||||
|
||||
### 方案A:命令行方式(推荐)
|
||||
**优点**:
|
||||
- 简单直接,无需额外开发
|
||||
- 稳定可靠
|
||||
- 功能完整
|
||||
|
||||
**使用方法**:
|
||||
```bash
|
||||
# 直接运行后端程序
|
||||
cd backend
|
||||
.\wechat-crawler.exe
|
||||
|
||||
# 按菜单选择功能
|
||||
数字键1:提取公众号主页
|
||||
数字键3:获取文章列表
|
||||
数字键5:批量下载文章
|
||||
```
|
||||
|
||||
### 方案B:Web界面(需要修复)
|
||||
**需要完成的工作**:
|
||||
1. ✅ API服务器已创建
|
||||
2. ❌ 前端JS代码需要修复
|
||||
3. ❌ 后端需要添加更多命令行接口
|
||||
4. ❌ 前端需要修改为调用真实API
|
||||
|
||||
**工作量**:约2-3小时开发时间
|
||||
|
||||
## 💡 临时解决方案
|
||||
|
||||
在API服务器和前端代码完全修复之前,建议:
|
||||
|
||||
### 1. 使用命令行程序
|
||||
```bash
|
||||
cd d:\workspace\Access_wechat_article\backend
|
||||
.\wechat-crawler.exe
|
||||
```
|
||||
|
||||
### 2. 只使用"提取公众号主页"功能
|
||||
这个功能已经可以正常工作(通过API服务器)
|
||||
|
||||
### 3. 其他功能直接在命令行执行
|
||||
- 功能3:获取文章列表
|
||||
- 功能5:批量下载文章
|
||||
|
||||
## 📊 功能对比
|
||||
|
||||
| 功能 | 命令行 | Web界面 | 状态 |
|
||||
|------|--------|---------|------|
|
||||
| 提取公众号主页 | ✅ | ✅ | 可用 |
|
||||
| 获取文章列表 | ✅ | ❌ | 仅命令行 |
|
||||
| 批量下载文章 | ✅ | ❌ | 仅命令行 |
|
||||
| 数据查看 | ✅ | ⏳ | 需修复 |
|
||||
|
||||
## 🎯 下一步建议
|
||||
|
||||
### 选项1:继续使用命令行(推荐)
|
||||
- 功能完整且稳定
|
||||
- 无需额外开发
|
||||
- 立即可用
|
||||
|
||||
### 选项2:完善Web界面
|
||||
需要完成:
|
||||
1. 修复前端JS代码错误
|
||||
2. 实现完整的API调用逻辑
|
||||
3. 测试所有功能
|
||||
|
||||
**预计时间**:2-3小时
|
||||
|
||||
## 🔍 错误定位
|
||||
|
||||
当前前端代码的主要问题在:
|
||||
- 文件:`frontend/js/app.js`
|
||||
- 行号:第68行
|
||||
- 问题:字符串拼接错误,导致语法错误
|
||||
|
||||
## 📞 技术支持
|
||||
|
||||
如需完善Web界面,建议:
|
||||
1. 先修复`app.js`第68行的语法错误
|
||||
2. 测试API服务器是否正常运行
|
||||
3. 逐个功能进行调试和完善
|
||||
|
||||
---
|
||||
|
||||
**当前状态**:建议优先使用命令行程序,功能完整且稳定。Web界面可作为未来优化项目。
|
||||
49
启动Web系统.bat
Normal file
49
启动Web系统.bat
Normal file
@@ -0,0 +1,49 @@
|
||||
@echo off
|
||||
chcp 65001 >nul
|
||||
cls
|
||||
|
||||
echo ===============================================
|
||||
echo 🚀 微信公众号文章爬虫 - Web系统启动器
|
||||
echo ===============================================
|
||||
echo.
|
||||
echo 正在启动系统,请稍候...
|
||||
echo.
|
||||
|
||||
:: 启动API服务器(后台运行)
|
||||
echo [1/2] 启动 API 服务器...
|
||||
cd backend\api
|
||||
start "微信爬虫-API服务器" cmd /c "start_api.bat"
|
||||
cd ..\..
|
||||
timeout /t 2 /nobreak >nul
|
||||
|
||||
:: 启动前端服务器
|
||||
echo [2/2] 启动 前端服务器...
|
||||
cd frontend
|
||||
start "微信爬虫-前端服务器" cmd /c "start_web.bat"
|
||||
cd ..
|
||||
|
||||
echo.
|
||||
echo ===============================================
|
||||
echo ✅ 系统启动完成!
|
||||
echo ===============================================
|
||||
echo.
|
||||
echo 📝 重要提示:
|
||||
echo.
|
||||
echo 1️⃣ API服务器: http://localhost:8080
|
||||
echo - 提供后端接口服务
|
||||
echo - 窗口标题: "微信爬虫-API服务器"
|
||||
echo.
|
||||
echo 2️⃣ 前端界面: http://localhost:8000
|
||||
echo - Web操作界面
|
||||
echo - 窗口标题: "微信爬虫-前端服务器"
|
||||
echo.
|
||||
echo ⚠️ 请不要关闭这两个窗口!
|
||||
echo.
|
||||
echo 💡 使用说明:
|
||||
echo - 浏览器会自动打开前端界面
|
||||
echo - 如未自动打开,请手动访问 http://localhost:8000
|
||||
echo - 使用完毕后,关闭两个服务器窗口即可
|
||||
echo.
|
||||
echo ===============================================
|
||||
|
||||
pause
|
||||
Reference in New Issue
Block a user