248 lines
4.7 KiB
Markdown
248 lines
4.7 KiB
Markdown
|
|
# 百家号文章导出工具
|
|||
|
|
|
|||
|
|
一个用于导出百家号作者指定时间内发文信息的Web工具。
|
|||
|
|
|
|||
|
|
## 快速启动
|
|||
|
|
|
|||
|
|
### 方式1:使用 Gunicorn 启动(推荐生产环境)
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 赋予执行权限(首次运行)
|
|||
|
|
chmod +x start.sh stop.sh
|
|||
|
|
|
|||
|
|
# 安装 gunicorn(如果未安装)
|
|||
|
|
pip install gunicorn
|
|||
|
|
|
|||
|
|
# 使用 Gunicorn 启动(默认)
|
|||
|
|
./start.sh
|
|||
|
|
# 或明确指定
|
|||
|
|
./start.sh gunicorn
|
|||
|
|
|
|||
|
|
# 停止服务
|
|||
|
|
./stop.sh
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 方式2:使用 nohup 启动(开发测试)
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 使用 nohup 模式启动
|
|||
|
|
./start.sh nohup
|
|||
|
|
|
|||
|
|
# 停止服务
|
|||
|
|
./stop.sh
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 方式3:手动启动
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 1. 创建虚拟环境(首次运行)
|
|||
|
|
python3 -m venv .venv
|
|||
|
|
|
|||
|
|
# 2. 激活虚拟环境
|
|||
|
|
source .venv/bin/activate
|
|||
|
|
|
|||
|
|
# 3. 安装依赖(首次运行)
|
|||
|
|
pip install -r requirements.txt
|
|||
|
|
|
|||
|
|
# 4. 启动服务
|
|||
|
|
python app.py
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
服务将在 `http://127.0.0.1:8030` 启动
|
|||
|
|
|
|||
|
|
## 功能特点
|
|||
|
|
|
|||
|
|
- 📝 导出百家号作者指定时间内的文章信息
|
|||
|
|
- 📋 任务队列功能,支持离线处理
|
|||
|
|
- 🔄 动态并发处理,智能调整线程数
|
|||
|
|
- 📊 生成Excel格式文件
|
|||
|
|
- 🎯 包含文章标题、链接和发布时间
|
|||
|
|
- 🎨 简洁美观的Web界面(钉钉科技蓝风格)
|
|||
|
|
- 🔐 用户登录权限系统
|
|||
|
|
|
|||
|
|
## 技术栈
|
|||
|
|
|
|||
|
|
- **后端**: Python + Flask
|
|||
|
|
- **前端**: HTML + CSS + jQuery
|
|||
|
|
- **数据处理**: Pandas + BeautifulSoup4
|
|||
|
|
- **Excel导出**: OpenPyXL
|
|||
|
|
|
|||
|
|
## 安装步骤
|
|||
|
|
|
|||
|
|
### 1. 克隆项目
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
git clone <repository-url>
|
|||
|
|
cd ai_baijiahao
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. 创建虚拟环境
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
python3 -m venv .venv
|
|||
|
|
source .venv/bin/activate # Linux/Mac
|
|||
|
|
# 或
|
|||
|
|
.venv\Scripts\activate # Windows
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. 安装依赖
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
pip install -r requirements.txt
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4. 启动服务
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 使用启动脚本(Linux/Mac)
|
|||
|
|
chmod +x start.sh
|
|||
|
|
./start.sh
|
|||
|
|
|
|||
|
|
# 或手动启动
|
|||
|
|
python app.py
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
服务将在 `http://127.0.0.1:8030` 启动
|
|||
|
|
|
|||
|
|
## 使用说明
|
|||
|
|
|
|||
|
|
### 登录系统
|
|||
|
|
|
|||
|
|
1. 首次访问需要注册账号
|
|||
|
|
2. 输入用户名和密码登录
|
|||
|
|
|
|||
|
|
### 即时导出
|
|||
|
|
|
|||
|
|
1. 在浏览器中打开百家号作者主页,复制完整的URL地址
|
|||
|
|
- 例如: `https://baijiahao.baidu.com/u?app_id=1700253559210167`
|
|||
|
|
|
|||
|
|
2. 在工具页面输入URL地址,选择时间范围
|
|||
|
|
|
|||
|
|
3. 点击"开始导出"按钮,等待数据获取完成
|
|||
|
|
|
|||
|
|
4. 导出成功后,点击"下载Excel文件"保存文件
|
|||
|
|
|
|||
|
|
### 队列导出
|
|||
|
|
|
|||
|
|
1. 点击"任务队列"菜单
|
|||
|
|
|
|||
|
|
2. 添加多个导出任务到队列
|
|||
|
|
|
|||
|
|
3. 系统会自动并发处理(动态调整1-3个线程)
|
|||
|
|
|
|||
|
|
4. 任务完成后,点击"查看"按钮下载Excel文件
|
|||
|
|
|
|||
|
|
## 生产环境部署
|
|||
|
|
|
|||
|
|
### 方案1:systemd 服务(推荐)
|
|||
|
|
|
|||
|
|
**优点**:
|
|||
|
|
- ✅ 自动重启(进程崩溃时)
|
|||
|
|
- ✅ 开机自启
|
|||
|
|
- ✅ 资源限制
|
|||
|
|
- ✅ 日志管理
|
|||
|
|
- ✅ 服务监控
|
|||
|
|
|
|||
|
|
**安装步骤**:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 1. 安装服务
|
|||
|
|
sudo chmod +x install_service.sh
|
|||
|
|
sudo ./install_service.sh
|
|||
|
|
|
|||
|
|
# 2. 启动服务
|
|||
|
|
sudo systemctl start baijiahao
|
|||
|
|
|
|||
|
|
# 3. 查看状态
|
|||
|
|
sudo systemctl status baijiahao
|
|||
|
|
|
|||
|
|
# 4. 查看日志
|
|||
|
|
sudo journalctl -u baijiahao -f
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**常用命令**:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 启动/停止/重启
|
|||
|
|
sudo systemctl start baijiahao
|
|||
|
|
sudo systemctl stop baijiahao
|
|||
|
|
sudo systemctl restart baijiahao
|
|||
|
|
|
|||
|
|
# 查看状态和日志
|
|||
|
|
sudo systemctl status baijiahao
|
|||
|
|
sudo journalctl -u baijiahao -f
|
|||
|
|
|
|||
|
|
# 禁用/启用开机自启
|
|||
|
|
sudo systemctl disable baijiahao
|
|||
|
|
sudo systemctl enable baijiahao
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 方案2:nohup(简单场景)
|
|||
|
|
|
|||
|
|
**优点**:简单快速
|
|||
|
|
**缺点**:无自动重启、无开机自启、管理困难
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 使用项目提供的启动脚本
|
|||
|
|
./start.sh
|
|||
|
|
|
|||
|
|
# 或手动使用 nohup
|
|||
|
|
nohup python app.py > logs/app.log 2>&1 &
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 方案3:Supervisor(备选)
|
|||
|
|
|
|||
|
|
安装 Supervisor:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
sudo apt-get install supervisor
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
创建配置文件 `/etc/supervisor/conf.d/baijiahao.conf`:
|
|||
|
|
|
|||
|
|
```ini
|
|||
|
|
[program:baijiahao]
|
|||
|
|
command=/var/www/ai_baijiahao/.venv/bin/python app.py
|
|||
|
|
directory=/var/www/ai_baijiahao
|
|||
|
|
user=www-data
|
|||
|
|
autostart=true
|
|||
|
|
autorestart=true
|
|||
|
|
stdout_logfile=/var/www/ai_baijiahao/logs/app.log
|
|||
|
|
stderr_logfile=/var/www/ai_baijiahao/logs/error.log
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
启动服务:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
sudo supervisorctl reread
|
|||
|
|
sudo supervisorctl update
|
|||
|
|
sudo supervisorctl start baijiahao
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 项目结构
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
ai_baijiahao/
|
|||
|
|
├── app.py # Flask后端服务
|
|||
|
|
├── requirements.txt # Python依赖
|
|||
|
|
├── templates/ # HTML模板
|
|||
|
|
│ └── index.html
|
|||
|
|
├── static/ # 静态资源
|
|||
|
|
│ ├── css/
|
|||
|
|
│ │ └── style.css
|
|||
|
|
│ └── js/
|
|||
|
|
│ └── main.js
|
|||
|
|
└── exports/ # Excel导出目录(自动创建)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 注意事项
|
|||
|
|
|
|||
|
|
- 请确保输入的是有效的百家号作者主页地址
|
|||
|
|
- 导出过程可能需要一些时间,请耐心等待
|
|||
|
|
- 如果文章数量较多,导出时间会相应延长
|
|||
|
|
- 本工具仅供学习交流使用
|
|||
|
|
|
|||
|
|
## 许可证
|
|||
|
|
|
|||
|
|
仅供学习交流使用
|