Add README.md
This commit is contained in:
86
README.md
Normal file
86
README.md
Normal file
@@ -0,0 +1,86 @@
|
||||
# AI 图片去重审核系统
|
||||
|
||||
基于 DashScope 多模态 Embedding + DashVector 向量数据库的图片相似度检测系统。
|
||||
|
||||
## 功能特性
|
||||
|
||||
- 使用 DashScope 多模态 Embedding 生成图片向量
|
||||
- 使用 DashVector 进行高效向量相似度搜索
|
||||
- 支持 pHash 感知哈希预筛选
|
||||
- 异步批量下载和处理图片
|
||||
- 自动标记重复图片并记录相似度分数
|
||||
|
||||
## 环境要求
|
||||
|
||||
- Python 3.8+
|
||||
- MySQL 数据库
|
||||
- DashScope API Key
|
||||
- DashVector API Key
|
||||
|
||||
## 安装
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## 配置
|
||||
|
||||
创建 `config.ini` 配置文件:
|
||||
|
||||
```ini
|
||||
[database]
|
||||
host = localhost
|
||||
port = 3306
|
||||
user = root
|
||||
password = your_password
|
||||
database = your_database
|
||||
charset = utf8mb4
|
||||
|
||||
[dashscope]
|
||||
api_key = your_dashscope_api_key
|
||||
|
||||
[dashvector]
|
||||
api_key = your_dashvector_api_key
|
||||
endpoint = your_endpoint
|
||||
collection_name = image_vectors
|
||||
vector_dimension = 1024
|
||||
|
||||
[image]
|
||||
cdn_base = https://your-cdn.com/
|
||||
|
||||
[similarity]
|
||||
phash_threshold = 10
|
||||
vector_threshold = 0.85
|
||||
|
||||
[process]
|
||||
batch_size = 100
|
||||
concurrent_downloads = 10
|
||||
log_level = INFO
|
||||
log_file = image_similarity.log
|
||||
```
|
||||
|
||||
## 使用方法
|
||||
|
||||
```bash
|
||||
python image_similarity_check.py
|
||||
```
|
||||
|
||||
## 项目结构
|
||||
|
||||
```
|
||||
├── image_similarity_check.py # 主程序:图片去重审核
|
||||
├── query_status.py # 查询处理状态
|
||||
├── reset_data.py # 重置数据
|
||||
├── reset_vector.py # 重置向量库
|
||||
├── basket.py # 测试脚本
|
||||
├── requirements.txt # 依赖包
|
||||
└── config.ini # 配置文件(不提交)
|
||||
```
|
||||
|
||||
## 工作流程
|
||||
|
||||
1. 从数据库获取待处理的图片记录
|
||||
2. 调用 DashScope API 获取图片的多模态 Embedding
|
||||
3. 在 DashVector 中搜索相似图片
|
||||
4. 根据相似度阈值判断是否重复
|
||||
5. 更新数据库状态(重复/不重复)
|
||||
Reference in New Issue
Block a user