108 lines
2.7 KiB
Markdown
108 lines
2.7 KiB
Markdown
|
|
# AI MIP Query Task 表创建说明
|
|||
|
|
|
|||
|
|
## 1. 创建表
|
|||
|
|
|
|||
|
|
在MySQL数据库中执行以下文件:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
mysql -u your_user -p your_database < db/ai_mip_query_task.sql
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
或者在MySQL客户端中直接执行 `db/ai_mip_query_task.sql` 文件内容。
|
|||
|
|
|
|||
|
|
## 2. 表结构说明
|
|||
|
|
|
|||
|
|
### 字段列表
|
|||
|
|
|
|||
|
|
| 字段名 | 类型 | 说明 |
|
|||
|
|
|--------|------|------|
|
|||
|
|
| id | int | 主键ID |
|
|||
|
|
| query_word | varchar(512) | 查询词/关键词 |
|
|||
|
|
| query_type | enum | 查询类型:keyword/phrase/long_tail |
|
|||
|
|
| task_date | char(8) | 任务日期 YYYYMMDD |
|
|||
|
|
| threshold_max | int | 最大抓取数量阈值 |
|
|||
|
|
| current_count | int | 当前已抓取数量 |
|
|||
|
|
| status | enum | 任务状态:ready/doing/failed/finished/closed |
|
|||
|
|
| priority | tinyint | 优先级 1-10 |
|
|||
|
|
| category | varchar(64) | 分类标签 |
|
|||
|
|
| source_platform | varchar(64) | 来源平台 |
|
|||
|
|
| crawl_url_count | int | 已爬取URL数量 |
|
|||
|
|
| valid_url_count | int | 有效URL数量(带广告) |
|
|||
|
|
| error_message | text | 错误信息 |
|
|||
|
|
| started_at | timestamp | 开始执行时间 |
|
|||
|
|
| finished_at | timestamp | 完成时间 |
|
|||
|
|
| closed_at | timestamp | 达到阈值关闭时间 |
|
|||
|
|
| created_at | timestamp | 创建时间 |
|
|||
|
|
| updated_at | timestamp | 更新时间 |
|
|||
|
|
| created_by | varchar(64) | 创建人 |
|
|||
|
|
| remark | varchar(512) | 备注信息 |
|
|||
|
|
|
|||
|
|
### 索引
|
|||
|
|
|
|||
|
|
- `uniq_query_date`: 同一查询词每天只有一个任务
|
|||
|
|
- `idx_date_status`: 按日期和状态查询
|
|||
|
|
- `idx_status_priority`: 按状态和优先级查询
|
|||
|
|
- `idx_category`: 按分类查询
|
|||
|
|
- `idx_threshold`: 阈值监控
|
|||
|
|
- `idx_closed`: 关闭时间索引
|
|||
|
|
|
|||
|
|
## 3. 使用示例
|
|||
|
|
|
|||
|
|
### Python代码
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from db_manager import QueryTaskManager
|
|||
|
|
|
|||
|
|
# 初始化管理器
|
|||
|
|
task_mgr = QueryTaskManager()
|
|||
|
|
|
|||
|
|
# 创建任务
|
|||
|
|
task_id = task_mgr.create_task(
|
|||
|
|
query_word="糖尿病治疗",
|
|||
|
|
query_type="keyword",
|
|||
|
|
threshold_max=50,
|
|||
|
|
priority=3,
|
|||
|
|
category="医疗"
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# 获取ready任务
|
|||
|
|
ready_tasks = task_mgr.get_ready_tasks(limit=10)
|
|||
|
|
|
|||
|
|
# 更新任务状态
|
|||
|
|
task_mgr.update_task_status(task_id, 'doing')
|
|||
|
|
|
|||
|
|
# 增加抓取计数
|
|||
|
|
task_mgr.increment_crawl_count(task_id, crawl_count=5, valid_count=3)
|
|||
|
|
|
|||
|
|
# 检查阈值
|
|||
|
|
task_mgr.check_threshold(task_id)
|
|||
|
|
|
|||
|
|
# 获取统计信息
|
|||
|
|
stats = task_mgr.get_task_statistics('20260119')
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 4. 测试
|
|||
|
|
|
|||
|
|
运行测试脚本:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
python test_query_task.py
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 5. 任务状态流转
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
ready (准备中)
|
|||
|
|
↓
|
|||
|
|
doing (执行中)
|
|||
|
|
↓
|
|||
|
|
finished (完成) / failed (失败) / closed (达到阈值关闭)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 6. 注意事项
|
|||
|
|
|
|||
|
|
1. **唯一约束**:同一查询词在同一天只能有一个任务
|
|||
|
|
2. **阈值检查**:达到threshold_max时自动关闭任务
|
|||
|
|
3. **优先级**:数字越小优先级越高(1-10)
|
|||
|
|
4. **时间戳**:状态变更会自动更新对应的时间字段
|