2026-01-30 18:30:05 +08:00
|
|
|
|
# AI 图片标签衍生系统
|
|
|
|
|
|
|
|
|
|
|
|
基于千问视觉大模型(Qwen-VL)的医疗健康图片标签自动衍生系统。
|
|
|
|
|
|
|
|
|
|
|
|
## 功能概述
|
|
|
|
|
|
|
2026-02-05 23:52:26 +08:00
|
|
|
|
- **守护进程模式**:持续监控数据库,自动处理新数据(默认2秒轮询)
|
2026-02-05 19:01:12 +08:00
|
|
|
|
- **批量处理模式**:10张图片一个请求,多请求并发执行
|
|
|
|
|
|
- **内容审核处理**:自动识别审核失败图片,标记状态并记录原因
|
2026-01-30 18:30:05 +08:00
|
|
|
|
- **RESTful API 服务**:提供标签衍生的 HTTP 接口
|
|
|
|
|
|
- **智能重试机制**:API 调用失败自动重试,支持指数退避
|
|
|
|
|
|
- **统一配置管理**:支持环境变量配置,灵活部署
|
|
|
|
|
|
- **完整日志系统**:文件 + 控制台双输出,便于问题排查
|
|
|
|
|
|
|
2026-01-30 18:40:46 +08:00
|
|
|
|
## 标签衍生流程
|
|
|
|
|
|
|
2026-02-05 19:04:40 +08:00
|
|
|
|
```
|
|
|
|
|
|
1. 查询数据库
|
|
|
|
|
|
└─ SELECT * FROM ai_image_tags WHERE status='tag_extension'
|
|
|
|
|
|
|
|
|
|
|
|
2. 拼接图片URL
|
|
|
|
|
|
└─ 完整URL = CDN基础URL + image_url
|
|
|
|
|
|
|
|
|
|
|
|
3. 按10张分组,并发发送请求
|
|
|
|
|
|
├─ Prompt:图片1: ID=123, 原始标签「高血压」, 关键字「心血管」
|
|
|
|
|
|
├─ Prompt:图片2: ID=124, 原始标签「糖尿病」, 关键字「内分泌」
|
|
|
|
|
|
└─ + 10张原图URL
|
|
|
|
|
|
|
|
|
|
|
|
4. 大模型返回衍生标签
|
|
|
|
|
|
└─ {"results": [{"id": 123, "derived_tags": ["衍生1", "衍生2"]}]}
|
|
|
|
|
|
|
|
|
|
|
|
5. 更新数据库
|
|
|
|
|
|
├─ ai_tags表:INSERT 衍生标签
|
|
|
|
|
|
├─ ai_image_tags表:UPDATE tag_id, tag_name
|
|
|
|
|
|
└─ status: tag_extension → manual_review
|
|
|
|
|
|
|
|
|
|
|
|
6. 审核失败处理(DataInspectionFailed)
|
|
|
|
|
|
├─ status → automated_review_failed
|
|
|
|
|
|
└─ 记录失败原因到 automated_review_failed_reason
|
|
|
|
|
|
```
|
2026-01-30 18:40:46 +08:00
|
|
|
|
|
2026-01-30 18:30:05 +08:00
|
|
|
|
## 项目结构
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
ai_tagging_images/
|
|
|
|
|
|
├── config/
|
|
|
|
|
|
│ ├── __init__.py
|
|
|
|
|
|
│ └── settings.py # 配置管理中心
|
|
|
|
|
|
├── logs/ # 日志目录(自动创建)
|
|
|
|
|
|
├── promt/
|
|
|
|
|
|
│ └── qwen_tag_derive_prompt.py
|
|
|
|
|
|
├── database_config.py # 数据库连接和 DAO
|
2026-02-05 21:29:59 +08:00
|
|
|
|
├── image_tag_derive.py # 标签衍生主程序(支持守护模式)
|
|
|
|
|
|
├── start_tag_derive.sh # 部署管理脚本
|
2026-01-30 18:30:05 +08:00
|
|
|
|
├── logger.py # 日志模块
|
|
|
|
|
|
├── retry_handler.py # 重试机制
|
|
|
|
|
|
├── tag_derive_api.py # FastAPI 服务
|
|
|
|
|
|
├── query_tags.py # 标签查询工具
|
|
|
|
|
|
├── check_results.py # 结果检查工具
|
|
|
|
|
|
├── requirements.txt # 依赖清单
|
|
|
|
|
|
└── ai_article.sql # 数据库结构
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 快速开始
|
|
|
|
|
|
|
|
|
|
|
|
### 1. 安装依赖
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
pip install -r requirements.txt
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 2. 配置环境变量(可选)
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# Windows
|
|
|
|
|
|
set DASHSCOPE_API_KEY=your-api-key
|
|
|
|
|
|
set DB_HOST=localhost
|
|
|
|
|
|
set DB_PASSWORD=your-password
|
|
|
|
|
|
|
|
|
|
|
|
# Linux/Mac
|
|
|
|
|
|
export DASHSCOPE_API_KEY=your-api-key
|
|
|
|
|
|
export DB_HOST=localhost
|
|
|
|
|
|
export DB_PASSWORD=your-password
|
|
|
|
|
|
```
|
|
|
|
|
|
|
2026-02-05 21:29:59 +08:00
|
|
|
|
### 3. 运行标签衍生服务
|
2026-01-30 18:30:05 +08:00
|
|
|
|
|
2026-02-05 21:29:59 +08:00
|
|
|
|
**守护模式(推荐):**
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 持续监控数据库,自动处理新数据
|
|
|
|
|
|
python image_tag_derive.py --daemon
|
|
|
|
|
|
|
2026-02-05 23:52:26 +08:00
|
|
|
|
# 指定轮询间隔(默认2秒)
|
|
|
|
|
|
python image_tag_derive.py --daemon --interval 2
|
2026-02-05 21:29:59 +08:00
|
|
|
|
|
|
|
|
|
|
# 并发配置
|
|
|
|
|
|
python image_tag_derive.py --daemon --batch-size 50 --concurrency 3
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**单次执行模式:**
|
2026-01-30 18:30:05 +08:00
|
|
|
|
```bash
|
|
|
|
|
|
# 处理全部待处理数据
|
|
|
|
|
|
python image_tag_derive.py
|
|
|
|
|
|
|
2026-02-05 19:01:12 +08:00
|
|
|
|
# 测试模式:只处理指定数量
|
|
|
|
|
|
python image_tag_derive.py --limit 10
|
|
|
|
|
|
|
2026-01-30 18:30:05 +08:00
|
|
|
|
# 从指定ID开始处理(断点续传)
|
|
|
|
|
|
python image_tag_derive.py --start-id 100
|
|
|
|
|
|
|
|
|
|
|
|
# 指定ID范围处理
|
|
|
|
|
|
python image_tag_derive.py --start-id 100 --end-id 200
|
|
|
|
|
|
|
2026-02-05 19:01:12 +08:00
|
|
|
|
# 指定批次大小和并发数
|
|
|
|
|
|
python image_tag_derive.py --batch-size 50 --concurrency 5
|
2026-01-30 18:30:05 +08:00
|
|
|
|
|
|
|
|
|
|
# 按指定ID处理(单个或多个)
|
|
|
|
|
|
python image_tag_derive.py --id 16495
|
|
|
|
|
|
python image_tag_derive.py --id 16495 16496 16497
|
|
|
|
|
|
```
|
|
|
|
|
|
|
2026-02-05 19:01:12 +08:00
|
|
|
|
**命令行参数:**
|
|
|
|
|
|
| 参数 | 说明 |
|
|
|
|
|
|
|------|------|
|
2026-02-05 21:29:59 +08:00
|
|
|
|
| `--daemon` | 守护模式:持续监控数据库 |
|
2026-02-05 23:52:26 +08:00
|
|
|
|
| `--interval` | 轮询间隔(秒),默认2秒 |
|
2026-02-05 19:01:12 +08:00
|
|
|
|
| `--limit` | 限制处理数量(测试用) |
|
|
|
|
|
|
| `--start-id` | 起始ID(断点续传) |
|
|
|
|
|
|
| `--end-id` | 结束ID |
|
|
|
|
|
|
| `--batch-size` | 每批次从数据库读取数量 |
|
|
|
|
|
|
| `--concurrency` | 并发请求数 |
|
|
|
|
|
|
| `--id` | 指定处理的ID列表 |
|
2026-01-30 18:30:05 +08:00
|
|
|
|
|
2026-02-05 21:29:59 +08:00
|
|
|
|
### 4. 部署管理脚本
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 启动服务
|
|
|
|
|
|
./start_tag_derive.sh start
|
|
|
|
|
|
|
|
|
|
|
|
# 停止服务
|
|
|
|
|
|
./start_tag_derive.sh stop
|
|
|
|
|
|
|
|
|
|
|
|
# 强制停止
|
|
|
|
|
|
./start_tag_derive.sh force-stop
|
|
|
|
|
|
|
|
|
|
|
|
# 重启服务
|
|
|
|
|
|
./start_tag_derive.sh restart
|
|
|
|
|
|
|
|
|
|
|
|
# 查看状态
|
|
|
|
|
|
./start_tag_derive.sh status
|
|
|
|
|
|
|
|
|
|
|
|
# 查看日志
|
|
|
|
|
|
./start_tag_derive.sh logs
|
|
|
|
|
|
./start_tag_derive.sh logs-follow
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 5. 启动 API 服务
|
2026-01-30 18:30:05 +08:00
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
python tag_derive_api.py
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
服务启动后访问:
|
|
|
|
|
|
- API 文档:http://127.0.0.1:8000/docs
|
|
|
|
|
|
- 健康检查:http://127.0.0.1:8000/health
|
|
|
|
|
|
|
|
|
|
|
|
## API 接口
|
|
|
|
|
|
|
|
|
|
|
|
| 方法 | 端点 | 说明 |
|
|
|
|
|
|
|------|------|------|
|
|
|
|
|
|
| GET | `/` | 服务状态 |
|
|
|
|
|
|
| GET | `/health` | 健康检查 |
|
|
|
|
|
|
| POST | `/api/derive/single` | 单张图片标签衍生 |
|
|
|
|
|
|
| POST | `/api/derive/batch` | 批量标签衍生(最多5张) |
|
|
|
|
|
|
| POST | `/api/derive/async` | 异步批量任务 |
|
|
|
|
|
|
| GET | `/api/task/{task_id}` | 查询任务状态 |
|
|
|
|
|
|
| GET | `/api/stats` | 统计信息 |
|
|
|
|
|
|
| GET | `/api/pending` | 待处理列表 |
|
|
|
|
|
|
|
|
|
|
|
|
### 示例请求
|
|
|
|
|
|
|
|
|
|
|
|
**单张图片衍生:**
|
|
|
|
|
|
```bash
|
|
|
|
|
|
curl -X POST http://127.0.0.1:8000/api/derive/single \
|
|
|
|
|
|
-H "Content-Type: application/json" \
|
|
|
|
|
|
-d '{
|
|
|
|
|
|
"image_url": "https://example.com/image.jpg",
|
|
|
|
|
|
"tag_name": "高血压"
|
|
|
|
|
|
}'
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**响应:**
|
|
|
|
|
|
```json
|
|
|
|
|
|
{
|
|
|
|
|
|
"success": true,
|
|
|
|
|
|
"original_tag": "高血压",
|
|
|
|
|
|
"derived_tags": ["血压升高", "心血管疾病", "降压药", "血压监测"],
|
|
|
|
|
|
"merged_tag": "#高血压##血压升高##心血管疾病##降压药##血压监测#"
|
|
|
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 配置说明
|
|
|
|
|
|
|
|
|
|
|
|
| 环境变量 | 默认值 | 说明 |
|
|
|
|
|
|
|----------|--------|------|
|
|
|
|
|
|
| `DASHSCOPE_API_KEY` | - | 千问 API Key |
|
|
|
|
|
|
| `DB_HOST` | localhost | 数据库主机 |
|
|
|
|
|
|
| `DB_PORT` | 3306 | 数据库端口 |
|
|
|
|
|
|
| `DB_USER` | root | 数据库用户 |
|
|
|
|
|
|
| `DB_PASSWORD` | - | 数据库密码 |
|
|
|
|
|
|
| `DB_DATABASE` | ai_article | 数据库名 |
|
|
|
|
|
|
| `BATCH_SIZE` | 3 | 每批处理图片数 |
|
|
|
|
|
|
| `QWEN_MAX_RETRIES` | 3 | 最大重试次数 |
|
|
|
|
|
|
| `LOG_LEVEL` | INFO | 日志级别 |
|
|
|
|
|
|
| `API_PORT` | 8000 | API 服务端口 |
|
|
|
|
|
|
|
|
|
|
|
|
## 技术栈
|
|
|
|
|
|
|
|
|
|
|
|
- **大模型**:阿里云千问 Qwen-VL-Max
|
|
|
|
|
|
- **Web 框架**:FastAPI
|
|
|
|
|
|
- **数据库**:MySQL 9.0
|
|
|
|
|
|
- **Python**:3.10+
|
|
|
|
|
|
|
|
|
|
|
|
## 数据表
|
|
|
|
|
|
|
|
|
|
|
|
主要涉及以下数据表:
|
|
|
|
|
|
- `ai_image_tags`:图片标签关联表
|
|
|
|
|
|
- `ai_tags`:标签主表
|
|
|
|
|
|
|
2026-02-05 19:01:12 +08:00
|
|
|
|
## 状态流转
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
tag_extension → manual_review (衍生成功)
|
|
|
|
|
|
tag_extension → automated_review_failed (内容审核失败)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**内容审核失败处理:**
|
|
|
|
|
|
- 当大模型返回 `DataInspectionFailed` 错误时
|
|
|
|
|
|
- 自动更新 `status = 'automated_review_failed'`
|
|
|
|
|
|
- 记录失败原因到 `automated_review_failed_reason` 字段
|
|
|
|
|
|
|
2026-01-30 18:30:05 +08:00
|
|
|
|
## 日志
|
|
|
|
|
|
|
|
|
|
|
|
日志文件保存在 `logs/` 目录,按日期命名:
|
|
|
|
|
|
```
|
|
|
|
|
|
logs/
|
|
|
|
|
|
└── tag_derive_20260130.log
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## License
|
|
|
|
|
|
|
|
|
|
|
|
MIT
|