init commit

This commit is contained in:
徐微
2025-12-08 15:20:22 +08:00
commit 1d0077510a
28 changed files with 9050234 additions and 0 deletions

194
README.md Normal file
View File

@@ -0,0 +1,194 @@
# crawler_tiktok
TikTok 数据抓取脚本,分为两阶段:
- 按关键词搜索视频链接并生成快照(`links`
- 根据视频链接抓取评论与二级回复并生成快照与可选 CSV`comments`
全项目基于 Python 标准库实现(`urllib``threading` 等),不依赖第三方包。
## 目录结构
```
crawler_tiktok/
├─ core/ # cURL 文本解析与请求发送
│ └─ curl.py
├─ tiktok/ # TikTok 业务逻辑
│ ├─ search.py # 关键词搜索视频链接
│ └─ comments.py # 抓取评论与二级回复
├─ data/ # 示例数据与输出目录
│ ├─ 1.text # cURL 文本(包含多个 curl 命令块)
│ ├─ keyword.txt # 关键词文件(每行一个关键词)
│ ├─ urls.json # 链接搜索快照输出(示例已有)
│ ├─ comments.csv # 评论 CSV 输出(可选)
│ └─ store.py # 统一的快照写入工具
├─ utils/ # 通用 IO 工具
│ └─ io.py
├─ main.py # 命令行入口子命令links / comments / all
└─ __init__.py # 包入口
```
## 准备工作
- 安装 Python建议 3.8+
- 准备 `data/1.text`
- 打开浏览器访问 TikTok登录后在开发者工具的 Network 面板选中相关请求,使用 “Copy as cURL” 复制。
- 将“评论接口”的 `curl ...` 放在第一段,“搜索接口”的 `curl ...` 放在第二段;两段之间可直接换行即可。
- 保留请求头(尤其是 `cookie`)以便接口正常返回。
- 准备关键词文件 `data/keyword.txt`(每行一个关键词),或使用命令行传参。
## 快速开始
在仓库根目录(必须为 `D:\work\crawler_tiktok`)直接运行脚本:
```
python main.py -h
```
### 1) 搜索视频链接links
将关键词并发搜索,统一去重并保存到 `urls.json`
```
python main.py links \
--keywords-file data\keyword.txt \
--file-path data\1.text \
--out data\urls.json \
--max-pages 50 \
--count 12 \
--timeout 30 \
--workers 5
```
可选:
- 通过 `--keyword` 重复传入多个关键词(可与 `--keywords-file` 混用)
- `--keywords` 逗号分隔的关键词字符串
输出 `urls.json` 结构示例:
```json
{
"keywords": ["xxx", "yyy"],
"items": [
{"keyword": "xxx", "count": 10, "links": ["https://www.tiktok.com/@user/video/123" ...]},
{"keyword": "yyy", "count": 8, "links": [ ... ]}
],
"total_count": 17855,
"links": ["https://www.tiktok.com/@user/video/123", ...]
}
```
### 2) 抓取评论与回复comments
从链接快照读取链接,抓取主评论与二级回复,并保存 JSON 与可选 CSV。
```
python main.py comments \
--links-json data\urls.json \
--out data\tik_comments.json \
--file-path data\1.text \
--count 100 \
--pages 100 \
--timeout 30 \
--reply-count 100 \
--reply-pages 100 \
--csv data\comments.csv \
--workers 8
```
输出 `tik_comments.json` 结构示例:
```json
{
"items": [
{
"link": "https://www.tiktok.com/@user/video/123",
"count": 42,
"comments": [
{
"cid": "xxx",
"text": "...",
"user": {"unique_id": "..."},
"replies": [{"text": "..."}, ...],
"reply_count": 3
}
]
}
]
}
```
若提供 `--csv`,会将主评论与回复分别以 `username,text` 形式追加到该文件。
### 3) 全流程一体化all
一次性串联链接搜索与评论抓取,适合流水线执行:
```
python main.py all \
--keywords-file data\keyword.txt \
--file-path data\1.text \
--links-out data\urls.json \
--search-max-pages 50 \
--search-count 12 \
--search-timeout 30 \
--search-workers 5 \
--comments-out data\tik_comments.json \
--comments-count 100 \
--comments-pages 100 \
--comments-timeout 30 \
--comments-limit 1000 \
--reply-count 100 \
--reply-pages 100 \
--reply-limit 2000 \
--csv data\comments.csv \
--comments-workers 8
```
### 4) 写入 MySQL从 CSV 导入)
`D:\work\crawler_tiktok` 下执行:
```
pip install pymysql
python main.py mysql \
--csv data\comments.csv \
--host localhost \
--port 3306 \
--user root \
--password <你的密码> \
--database crawler_tiktok \
--table comments
```
若数据库不存在,请先在 MySQL 中创建:
```
CREATE DATABASE IF NOT EXISTS `crawler_tiktok` DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
```
导入会在指定库中自动建表(如不存在),并批量插入 `username,text` 两列数据。
## 重要参数说明
- `--keyword / --keywords / --keywords-file`:三种方式提供关键词,最终会合并并去重。
- `--file-path`cURL 文本文件路径(包含多个 `curl ...` 命令块)。
- 第 1 块作为评论接口基准。
- 第 2 块作为搜索接口基准。
- 搜索阶段:`--max-pages` 分页轮次上限;`--count` 每页条数(默认从 URL 中推断,通常为 12`--workers` 并发线程数。
- 评论阶段:`--pages` 评论分页上限;`--count` 每页评论数;`--reply-count` / `--reply-pages` 回复分页与每页数;`--workers` 并发抓取线程数。
- `--timeout`:请求超时秒数。
- `--csv`:若提供则会将主评论与回复按 `username,text` 追加到该 CSV。
## 输出文件约定
- `data/urls.json`:链接搜索快照,包含 `keywords/items/total_count/links`
- `data/tik_comments.json`:评论抓取快照,包含 `items`(每项含 `link/count/comments`)。
- `data/comments.csv`CSV 格式的评论与回复(用户名、文本)。
## 常见问题
- 返回为空或报错:优先检查 `data/1.text` 的 cURL 是否有效,`cookie` 是否过期。
- 速率限制:适当降低 `--workers`、提高 `--timeout`,或分批执行。
- Windows 路径:示例中使用反斜杠;若在类 Unix 环境,改为 `/`
- 进度打印:抓取过程会打印 START/DONE/ERROR 以及评论统计,便于观察执行状态。

10
__init__.py Normal file
View File

@@ -0,0 +1,10 @@
"""crawler_tiktok 包
该包用于从 TikTok 搜集视频链接并抓取评论与回复。
模块划分:
- core基础能力如从 curl 文本解析 URL 与请求头)
- tiktok与 TikTok 相关的抓取逻辑(搜索、评论)
- utils通用 IO 工具JSON/CSV 读写、关键词文件加载)
- data数据快照的写入工具store以及示例数据文件
入口main.py 提供命令行子命令 links/comments/all。
"""

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

Binary file not shown.

74
core/curl.py Normal file
View File

@@ -0,0 +1,74 @@
import re
import json
from urllib.request import Request, urlopen
def _split_curl_blocks(text):
"""按出现的 `curl ` 关键字切分文本为多个命令块"""
blocks = []
indices = [m.start() for m in re.finditer(r"\bcurl\s", text)]
if not indices:
return blocks
for i, start in enumerate(indices):
end = indices[i + 1] if i + 1 < len(indices) else len(text)
blocks.append(text[start:end])
return blocks
def _parse_block(block):
"""从单个 curl 命令块中解析 URL 与头部
返回:`{'url': str, 'headers': dict}`,若无法解析 URL 返回 None
"""
url_m = re.search(r"curl\s+['\"](.*?)['\"]", block, re.S)
if not url_m:
return None
url = url_m.group(1)
headers = {}
for hm in re.finditer(r"-H\s+['\"]([^:]+):\s*(.*?)['\"]", block):
k = hm.group(1).strip()
v = hm.group(2).strip()
headers[k.lower()] = v
cm = re.search(r"-b\s+['\"](.*?)['\"]", block, re.S)
if cm:
headers['cookie'] = cm.group(1)
return {'url': url, 'headers': headers}
def parse_curl_file(file_path):
"""读取 curl 文本文件并解析为请求描述列表
参数:`file_path` 文件路径
返回:列表,每项包含 `url` 与 `headers`
"""
with open(file_path, 'r', encoding='utf-8') as f:
text = f.read()
blocks = _split_curl_blocks(text)
result = []
for b in blocks:
parsed = _parse_block(b)
if parsed:
result.append(parsed)
return result
def fetch_from_curl(file_path, index=0, timeout=30):
"""按索引选取解析出的请求并发起 GET
参数:`index` 为第几个 curl 块;`timeout` 请求超时秒数
返回:尝试解析为 JSON失败则返回原始 bytes
"""
reqs = parse_curl_file(file_path)
if not reqs or index < 0 or index >= len(reqs):
return None
item = reqs[index]
req = Request(item['url'], headers=item['headers'], method='GET')
with urlopen(req, timeout=timeout) as resp:
data = resp.read()
try:
return json.loads(data.decode('utf-8', errors='ignore'))
except Exception:
return data
"""curl 文本解析与请求发送工具
职责:
- 将包含多个 curl 命令的文本切分为块
- 从每个块解析 URL 与请求头(含 Cookie
- 基于解析结果发起 GET 请求并尝试返回 JSON
"""

28
data/1.text Normal file
View File

@@ -0,0 +1,28 @@
curl 'https://www.tiktok.com/api/comment/list/?WebIdLastTime=1762843898&aid=1988&app_language=en&app_name=tiktok_web&aweme_id=7554313767425985806&browser_language=zh-CN&browser_name=Mozilla&browser_online=true&browser_platform=Win32&browser_version=5.0%20%28Windows%20NT%2010.0%3B%20Win64%3B%20x64%29%20AppleWebKit%2F537.36%20%28KHTML%2C%20like%20Gecko%29%20Chrome%2F142.0.0.0%20Safari%2F537.36&channel=tiktok_web&cookie_enabled=true&count=20&cursor=0&data_collection_enabled=true&device_id=7571356851431851534&device_platform=web_pc&focus_state=true&from_page=video&history_len=4&is_fullscreen=false&is_page_visible=true&odinId=7571357724144124941&os=windows&priority_region=US&referer=https%3A%2F%2Fwww.google.com%2F&region=US&root_referer=https%3A%2F%2Fwww.google.com%2F&screen_height=1080&screen_width=1920&tz_name=Asia%2FShanghai&user_is_login=true&verifyFp=verify_mi5rex4u_QQ3WUuF2_qkrc_4zFV_9lPB_6ZiBsQO9Yg1Y&webcast_language=en&msToken=HGxJ50IHbcJxidsAy4biksW4jCfUpjG5IOfoNZd9m24WyL0muFz7f02qUT2A4HCKPQPheRtCr66460XMCJQ9mCplXR1zk1fKK81mU65TLKdczaVqauDay_1qAol348Cg_iQaiK74qPZ0EoJBKu_iZbCATA==&X-Bogus=DFSzsIVuMqxANCCACObDM-ZLJrPe&X-Gnarly=MxLz6L1B1jRWAkdMAhXfoRcZW7o7E89jQUXQepysu7jSC47hCDAgLaFj6ATg13br-ct2WppjvVuo3DrB5foJoo3XOjJH6TVzfVkPLs8Sw47ja/0uC5DB6DPtfPWekO9g9-ZviZeREnpG/N2SRXbqDr0-Go5o0OzoRp9wdRUoLSAM5nbo0niphLjDxyOzdsW/RqxqQNFbBnJJkNqIH4TiXbQmNafqX1Yk5cGaIFH7FHcjZsYRbtA2gc8cXePp2guxR5cXDepaBF2Wgsdmu8VM2q8ed8o7ohQXi56GiUuUXjLJJ130mlhYHJHgYHtxYynbeRw=' \
-H 'accept: */*' \
-H 'accept-language: zh-CN,zh;q=0.9' \
-b 'passport_csrf_token=fc48fa188f4d67baad733476f64baccf; passport_csrf_token_default=fc48fa188f4d67baad733476f64baccf; last_login_method=email; delay_guest_mode_vid=5; living_user_id=480687482446; tiktok_webapp_theme_source=auto; tt_chain_token=tTxp2ztaIaxTaXsGrJb+8Q==; tiktok_webapp_lang=zh-Hans; d_ticket=64bffcf490d4c9c7839c94bfa06f09cf36a43; ttwid=1%7C28VHivouaKIrGON9d-ZudJgYZUKezdC-9xuqts9saLQ%7C1763362651%7C2cd9c33c1a5733387a276a04d1af0af2026a80003217b55a841cb13520f1347f; myCookie=rap; fblo_1862952583919182=y; tiktok_webapp_theme=light; multi_sids=7571357724144124941%3Ab0685b23eb2eeb5f5e0a5604801f365b; cmpl_token=AgQQAPNSF-RO0rksMxu3N50083NFwqLP_4zZYKPv8w; sid_guard=b0685b23eb2eeb5f5e0a5604801f365b%7C1763435980%7C15552000%7CSun%2C+17-May-2026+03%3A19%3A40+GMT; uid_tt=b8d8d5d8b062bd2f3badb4e51420dc711f3305fec2a664f03306a64342912e7e; uid_tt_ss=b8d8d5d8b062bd2f3badb4e51420dc711f3305fec2a664f03306a64342912e7e; sid_tt=b0685b23eb2eeb5f5e0a5604801f365b; sessionid=b0685b23eb2eeb5f5e0a5604801f365b; sessionid_ss=b0685b23eb2eeb5f5e0a5604801f365b; tt_session_tlb_tag=sttt%7C1%7CsGhbI-su619eClYEgB82W__________pPerzW7vBXrU5jjfVhTeyC35OugALhjpAGeyHj7inp30%3D; sid_ucp_v1=1.0.1-KDVjMGU0Y2Q1OTg2ZjI0ODI2MjI2NWFmZDY0YWYyMmE0ZTkzY2M5M2QKIgiNiJX4w7e3iWkQzMvvyAYYswsgDDDgu8vIBjgBQOoHSAQQBBoHdXNlYXN0NSIgYjA2ODViMjNlYjJlZWI1ZjVlMGE1NjA0ODAxZjM2NWIyTgogV1B2-bFMjyKUv3NQtyvtMGmk7ui-Sl9aVTQStE9c4gkSIMJacGh0QuV1QyQEcXmFWpSTFvdWSbfDoe3vFgrRtb2YGAEiBnRpa3Rvaw; ssid_ucp_v1=1.0.1-KDVjMGU0Y2Q1OTg2ZjI0ODI2MjI2NWFmZDY0YWYyMmE0ZTkzY2M5M2QKIgiNiJX4w7e3iWkQzMvvyAYYswsgDDDgu8vIBjgBQOoHSAQQBBoHdXNlYXN0NSIgYjA2ODViMjNlYjJlZWI1ZjVlMGE1NjA0ODAxZjM2NWIyTgogV1B2-bFMjyKUv3NQtyvtMGmk7ui-Sl9aVTQStE9c4gkSIMJacGh0QuV1QyQEcXmFWpSTFvdWSbfDoe3vFgrRtb2YGAEiBnRpa3Rvaw; store-idc=useast5; store-country-code=us; store-country-code-src=uid; tt-target-idc=useast8; tt-target-idc-sign=jh5EgVgJzs2Zhfd2qcTVn1799rd4vK_dkjNT5E1hkY8ey6Iuuuo2qPfonmOJN-73-SEUPvtKH8L04xVHFdfuxGD4jEaT9iJGdE634_1n9RDR1aV8X7xR36LWBtgYwCfK96M28ozQElXgVDFsHS5jNIH0Jfq5gaisBGFWCAz7zEHd3YwWFjSVW96udWsQjGHM_y0UYLKmGEwmNh3nCmKOGntgfvFHrzuxfYL2T6upJ8x8WMb3GG-tGXKw4N05kaWH3LJCY3hGiCOIBX3s8_n0jvn1PLu9yiOUiF2f-K63HqcnVxPsuYfg8iYE47R08TALOuvQE9CFArejv2TMFIjiGKi-MB3BlWwwCUmUSbTyoTLBzSuKT5Elynh0l1JVMxrXlqZn39OMcs8_AB0n_RyyAF3FH9pV0sQ2lza6iJtZim0hmnAuTn8C26lyuss0BP3vlSPtp26rNMyqs1uZlpIclUzI7hmlBV6HNph2l1oBp7QMbvCQLFIGYMJXTL5kP_kX; tt_csrf_token=2alNSmcj-CbgYZfsy7TgGoLanfvaF_S5Ya4M; passport_fe_beating_status=true; ttwid=1%7C28VHivouaKIrGON9d-ZudJgYZUKezdC-9xuqts9saLQ%7C1763542066%7C310b419382cc569f618a86d012ca09f9c470719d9734b7ef5e637af0851cc2fc; odin_tt=081974ccd8c05aa99c9def51a2ae7bff2f2eff4b7187eaefc9035f6f29b3cf23b9e7e874eddd540c813b1aeacd524755eeea3b487b1604ab69034829839420a35eb841c2c1660309fb2e8733f8cec76e; store-country-sign=MEIEDEmOSATz1HVkE69LLQQgQ-xv4GcuilURIjdWweEXq86fX22G46h6wFIvL1YxVVUEEOiSOHWEhgwxBRqtF3ZvKh4; s_v_web_id=verify_mi5rex4u_QQ3WUuF2_qkrc_4zFV_9lPB_6ZiBsQO9Yg1Y; msToken=HGxJ50IHbcJxidsAy4biksW4jCfUpjG5IOfoNZd9m24WyL0muFz7f02qUT2A4HCKPQPheRtCr66460XMCJQ9mCplXR1zk1fKK81mU65TLKdczaVqauDay_1qAol348Cg_iQaiK74qPZ0EoJBKu_iZbCATA==; msToken=jWvit9euZoPFKgYZamkCxcNRJFbxE2efJZTwjJRsUOFUCqfJXFXMDMp6zFbk7VT7czHFPTMr0hdTmSdJybWT0mwnHYYrra7EQWKq9cwTr0NAdoyzkp-xrBuSlxD6a4Fcai6a72-qqTi1FxkAWjmrliiqwg==' \
-H 'priority: u=1, i' \
-H 'referer: https://www.tiktok.com/@user73001399001191/video/7561810082636287246' \
-H 'sec-ch-ua: "Chromium";v="142", "Google Chrome";v="142", "Not_A Brand";v="99"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "Windows"' \
-H 'sec-fetch-dest: empty' \
-H 'sec-fetch-mode: cors' \
-H 'sec-fetch-site: same-origin' \
-H 'user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36'
curl 'https://www.tiktok.com/api/search/general/full/?WebIdLastTime=1762843898&aid=1988&app_language=en&app_name=tiktok_web&browser_language=zh-CN&browser_name=Mozilla&browser_online=true&browser_platform=Win32&browser_version=5.0%20%28Windows%20NT%2010.0%3B%20Win64%3B%20x64%29%20AppleWebKit%2F537.36%20%28KHTML%2C%20like%20Gecko%29%20Chrome%2F142.0.0.0%20Safari%2F537.36&channel=tiktok_web&client_ab_versions=70508271%2C72437276%2C73547759%2C73720540%2C74444736%2C74446915%2C74465410%2C74627577%2C74679798%2C74703728%2C74744616%2C74746519%2C74757744%2C74780477%2C74782564%2C74793838%2C74798355%2C74803471%2C74808328%2C74824020%2C74843467%2C74852654%2C74860161%2C74879745%2C74879783%2C74882809%2C74891662%2C74902367%2C74926160%2C74928117%2C74935708%2C74936938%2C74970253%2C74972148%2C74973673%2C74976255%2C74980175%2C74983940%2C74994853%2C75001423%2C75005876%2C70405643%2C70772958%2C71057832%2C71200802%2C71381811%2C71516509%2C71803300%2C71962127%2C72360691%2C72361743%2C72408100%2C72854054%2C72892778%2C73171280%2C73208420%2C73989921%2C74276218%2C74611443%2C74844724&cookie_enabled=true&count=16&data_collection_enabled=true&device_id=7571356851431851534&device_platform=web_pc&device_type=web_h265&focus_state=true&from_page=search&history_len=5&is_fullscreen=false&is_page_visible=true&keyword=%E6%B6%82%E9%B8%A6%E7%BB%98%E7%94%BB&odinId=7571357724144124941&offset=0&os=windows&priority_region=US&referer=https%3A%2F%2Fwww.google.com%2F&region=US&root_referer=https%3A%2F%2Fwww.google.com%2F&screen_height=1080&screen_width=1920&search_source=search_history&tz_name=Asia%2FShanghai&user_is_login=true&verifyFp=verify_mi5rex4u_QQ3WUuF2_qkrc_4zFV_9lPB_6ZiBsQO9Yg1Y&web_search_code=%7B%22tiktok%22%3A%7B%22client_params_x%22%3A%7B%22search_engine%22%3A%7B%22ies_mt_user_live_video_card_use_libra%22%3A1%2C%22mt_search_general_user_live_card%22%3A1%7D%7D%2C%22search_server%22%3A%7B%7D%7D%7D&webcast_language=en&msToken=Evo6YZn35dd6dAsaNUBm7WHaOCixR84Hwjo6DVlNBGE0L56xiDF_dmDWfyJIJGq8LDEsjNm5G9H3uMP9LlVsCunVwx0lMEnriQWWWuzpN7Xp4j0Fj5wXbgMEqU9KMd5YfkZ1iqFubWhu99nvT06p5qpUeg==&X-Bogus=DFSzsIVu7-iANCCACObDh-ZLJrOb&X-Gnarly=M5J8rVZ10jjW3H5JvTrrLC6MGn7Qq4X0NFfuLZ1UYP2F5Tyem4CUCigEriTnj4Ui3kZdlhNogxKstfzoLHWeSKWiubEdsYZpiegkx-Ot2OUSwbyC9mcwB8T80j7nJpzf6tMOisjjbGiGzYQDJuNrqgxehrCDKUfdA6CbLeoguoGy7XQjTDxmg3/VsSqdhziaenBlm72xVj0GyLUEgrboEwzp11Xphma3Qo8b-/uiMZWQDNyJaC7rcb11dW-ffpSTMrXvf6EU6QXJav2NYvS2gNjMJBhMf15s0-NNQQIC-USLgeAWEo5Wj-gXgn/YmaUd7hZ=' \
-H 'accept: */*' \
-H 'accept-language: zh-CN,zh;q=0.9' \
-b 'passport_csrf_token=fc48fa188f4d67baad733476f64baccf; passport_csrf_token_default=fc48fa188f4d67baad733476f64baccf; last_login_method=email; delay_guest_mode_vid=5; living_user_id=480687482446; tiktok_webapp_theme_source=auto; tt_chain_token=tTxp2ztaIaxTaXsGrJb+8Q==; tiktok_webapp_lang=zh-Hans; d_ticket=64bffcf490d4c9c7839c94bfa06f09cf36a43; ttwid=1%7C28VHivouaKIrGON9d-ZudJgYZUKezdC-9xuqts9saLQ%7C1763362651%7C2cd9c33c1a5733387a276a04d1af0af2026a80003217b55a841cb13520f1347f; myCookie=rap; fblo_1862952583919182=y; tiktok_webapp_theme=light; multi_sids=7571357724144124941%3Ab0685b23eb2eeb5f5e0a5604801f365b; cmpl_token=AgQQAPNSF-RO0rksMxu3N50083NFwqLP_4zZYKPv8w; sid_guard=b0685b23eb2eeb5f5e0a5604801f365b%7C1763435980%7C15552000%7CSun%2C+17-May-2026+03%3A19%3A40+GMT; uid_tt=b8d8d5d8b062bd2f3badb4e51420dc711f3305fec2a664f03306a64342912e7e; uid_tt_ss=b8d8d5d8b062bd2f3badb4e51420dc711f3305fec2a664f03306a64342912e7e; sid_tt=b0685b23eb2eeb5f5e0a5604801f365b; sessionid=b0685b23eb2eeb5f5e0a5604801f365b; sessionid_ss=b0685b23eb2eeb5f5e0a5604801f365b; tt_session_tlb_tag=sttt%7C1%7CsGhbI-su619eClYEgB82W__________pPerzW7vBXrU5jjfVhTeyC35OugALhjpAGeyHj7inp30%3D; sid_ucp_v1=1.0.1-KDVjMGU0Y2Q1OTg2ZjI0ODI2MjI2NWFmZDY0YWYyMmE0ZTkzY2M5M2QKIgiNiJX4w7e3iWkQzMvvyAYYswsgDDDgu8vIBjgBQOoHSAQQBBoHdXNlYXN0NSIgYjA2ODViMjNlYjJlZWI1ZjVlMGE1NjA0ODAxZjM2NWIyTgogV1B2-bFMjyKUv3NQtyvtMGmk7ui-Sl9aVTQStE9c4gkSIMJacGh0QuV1QyQEcXmFWpSTFvdWSbfDoe3vFgrRtb2YGAEiBnRpa3Rvaw; ssid_ucp_v1=1.0.1-KDVjMGU0Y2Q1OTg2ZjI0ODI2MjI2NWFmZDY0YWYyMmE0ZTkzY2M5M2QKIgiNiJX4w7e3iWkQzMvvyAYYswsgDDDgu8vIBjgBQOoHSAQQBBoHdXNlYXN0NSIgYjA2ODViMjNlYjJlZWI1ZjVlMGE1NjA0ODAxZjM2NWIyTgogV1B2-bFMjyKUv3NQtyvtMGmk7ui-Sl9aVTQStE9c4gkSIMJacGh0QuV1QyQEcXmFWpSTFvdWSbfDoe3vFgrRtb2YGAEiBnRpa3Rvaw; store-idc=useast5; store-country-code=us; store-country-code-src=uid; tt-target-idc=useast8; tt-target-idc-sign=jh5EgVgJzs2Zhfd2qcTVn1799rd4vK_dkjNT5E1hkY8ey6Iuuuo2qPfonmOJN-73-SEUPvtKH8L04xVHFdfuxGD4jEaT9iJGdE634_1n9RDR1aV8X7xR36LWBtgYwCfK96M28ozQElXgVDFsHS5jNIH0Jfq5gaisBGFWCAz7zEHd3YwWFjSVW96udWsQjGHM_y0UYLKmGEwmNh3nCmKOGntgfvFHrzuxfYL2T6upJ8x8WMb3GG-tGXKw4N05kaWH3LJCY3hGiCOIBX3s8_n0jvn1PLu9yiOUiF2f-K63HqcnVxPsuYfg8iYE47R08TALOuvQE9CFArejv2TMFIjiGKi-MB3BlWwwCUmUSbTyoTLBzSuKT5Elynh0l1JVMxrXlqZn39OMcs8_AB0n_RyyAF3FH9pV0sQ2lza6iJtZim0hmnAuTn8C26lyuss0BP3vlSPtp26rNMyqs1uZlpIclUzI7hmlBV6HNph2l1oBp7QMbvCQLFIGYMJXTL5kP_kX; tt_csrf_token=2alNSmcj-CbgYZfsy7TgGoLanfvaF_S5Ya4M; passport_fe_beating_status=true; ttwid=1%7C28VHivouaKIrGON9d-ZudJgYZUKezdC-9xuqts9saLQ%7C1763542066%7C310b419382cc569f618a86d012ca09f9c470719d9734b7ef5e637af0851cc2fc; odin_tt=081974ccd8c05aa99c9def51a2ae7bff2f2eff4b7187eaefc9035f6f29b3cf23b9e7e874eddd540c813b1aeacd524755eeea3b487b1604ab69034829839420a35eb841c2c1660309fb2e8733f8cec76e; store-country-sign=MEIEDEmOSATz1HVkE69LLQQgQ-xv4GcuilURIjdWweEXq86fX22G46h6wFIvL1YxVVUEEOiSOHWEhgwxBRqtF3ZvKh4; s_v_web_id=verify_mi5rex4u_QQ3WUuF2_qkrc_4zFV_9lPB_6ZiBsQO9Yg1Y; msToken=imJTbrm-SxNJWwT4U4KOTLwsN5UnjQQzm-bHzXVVdiKnymtLbyQXbM_dziPgdrG3rcvbHJf_WWoBIuJ8AegguxYLz_gA0dQNa8suc7aNA3RA_Z7FqCUJd7iO4TJX5lU3dG0Ahchd28z0ip0HADyOygq2sQ==; msToken=Na9bB8_PXmSHmEprxHatJ3iAi6DYMS4DGaKymzxCSv5ho-vkxkGi2Oh4LHpL3LntQloywKO0p5gTBoGjA3BKW7uguLGHfS6FiTPzo5JkbwAMPkYyGdaoQh3yikWSJFGPNMpdrwaN8-ta5IcfWZlfv_QQcQ==' \
-H 'priority: u=1, i' \
-H 'referer: https://www.tiktok.com/search?q=%E6%B6%82%E9%B8%A6%E7%BB%98%E7%94%BB&t=1763542137936' \
-H 'sec-ch-ua: "Chromium";v="142", "Google Chrome";v="142", "Not_A Brand";v="99"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "Windows"' \
-H 'sec-fetch-dest: empty' \
-H 'sec-fetch-mode: cors' \
-H 'sec-fetch-site: same-origin' \
-H 'user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36'

Binary file not shown.

88957
data/comments.csv Normal file

File diff suppressed because one or more lines are too long

5548
data/key_comment.csv Normal file

File diff suppressed because it is too large Load Diff

596
data/keyword.txt Normal file
View File

@@ -0,0 +1,596 @@
direct liquid soft head acrylic marker pen
guangna direct liquid soft head acrylic marker pen
guangna direct liquid soft head acrylic marker pen 24 colors
@huangaa_3 guangna direct liquid soft head acrylic marker pen
how to draw with water based markers
acrylic marker holder
acrylic pen markers
liquid acrylic pens
direct liquid pen
direct liquid acrylic marker pen japan
direct liquid acrylic marker pen japanese
direct liquid acrylic marker pen jumbo
direct liquid acrylic marker pen
acrylic marker pens
liquitex acrylic markers review
acrylic marker art
liquitex acrylic marker
direct liquid acrylic marker pen instructions
direct liquid acrylic marker pen ink
direct liquid acrylic marker pen ii
direct liquid acrylic marker pen instructions pdf
direct liquid acrylic marker pen in usa
direct liquid acrylic marker pen msds
direct liquid acrylic marker pen markers
direct liquid acrylic marker pen michaels
direct liquid acrylic marker pen msds sheet
direct liquid acrylic marker pen large
direct liquid acrylic marker pen lowes
direct liquid acrylic marker pen label
direct liquid acrylic marker pen liquid
direct liquid acrylic marker pen light
direct liquid acrylic marker pen liner
direct liquid acrylic marker pen liner review
what pen to use to write on acrylic
liquid marker pen
direct liquid acrylic marker pen kit
direct liquid acrylic marker pen kit instructions
direct liquid acrylic marker pen orange
direct liquid acrylic marker pen only
direct liquid acrylic marker pen on amazon
direct liquid acrylic marker pen oil
direct liquid acrylic marker pen on sale
direct liquid acrylic marker pen off clothes
direct liquid acrylic marker pen off golf balls
direct liquid acrylic marker pen on walls
direct liquid acrylic marker pen pack
direct liquid acrylic marker pen pink
direct liquid acrylic marker pen pen
direct liquid acrylic marker pen purple
direct liquid acrylic marker pen price
direct liquid acrylic marker pen paper mate
direct liquid acrylic marker pen quality
direct liquid acrylic marker pen quick dry
direct liquid acrylic marker pen qvc
direct liquid acrylic marker pen quizlet
direct liquid acrylic marker pen quiz
direct liquid acrylic marker pen refill
direct liquid acrylic marker pen review
direct liquid acrylic marker pen reddit
direct liquid acrylic marker pen red
direct liquid acrylic marker pen target
direct liquid acrylic marker pen tip
direct liquid acrylic marker pen type
direct liquid acrylic marker pen tint
direct liquid acrylic marker pen use
direct liquid acrylic marker pen usa
direct liquid acrylic marker pen us
direct liquid acrylic marker pen walmart
direct liquid acrylic marker pen white
direct liquid acrylic marker pen wholesale
direct liquid acrylic marker pen waterproof
direct liquid acrylic marker pen walgreens
direct liquid acrylic marker pen video
direct liquid acrylic marker pen vintage
direct liquid acrylic marker pen vs
direct liquid acrylic marker pen volume
direct liquid acrylic marker pen vs regular
direct liquid acrylic marker pen vape
direct liquid acrylic marker pen xl
direct liquid acrylic marker pen x2
acrylic marker diy
diy acrylic paint markers
direct liquid acrylic marker pen zoom
direct liquid acrylic marker pen zero
direct liquid acrylic marker pen zipper
direct liquid acrylic marker pen zip
direct liquid acrylic marker pen zoominfo
acrylic marker paper
acrylic marker painting
acrylic liquid pens
acrylic markers blick
what markers can you use on acrylic
are liquid chalk markers permanent
acrylic marker edding
direct liquid acrylic marker pen amazon
direct liquid acrylic marker pen amazon prime
direct liquid acrylic marker pen acrylic
direct liquid acrylic marker pen app
direct liquid acrylic marker pen art
acrylic marker fine tip
acrylic paint markers fine tip
are sharpie gel pens waterproof
liqui-mark gel pens
acrylic marker graffiti
acrylic paint markers how to use
acrylic ink marker
krink acrylic markers
permanent marker on acrylic plastic
liqui-mark permanent markers
liquitex acrylic paint markers
marker acrylic
acrylic paint marker waterproof
acrylic pen marker
direct liquid acrylic marker pen black
direct liquid acrylic marker pen bulk
direct liquid acrylic marker pen blue
direct liquid acrylic marker pen brand
direct liquid acrylic marker pen brand name
direct liquid acrylic marker pen bleeding through paper
direct liquid acrylic marker pen bleeding through paint
direct liquid acrylic marker pen brush photoshop
oil based marker on acrylic paint
acrylic markers on plastic
acrylic paint marker refill
are sharpies water based
are sharpies oil or water based
acrylic marker liquitex
liquitex acrylic pen
direct liquid acrylic marker pen directions
direct liquid acrylic marker pen dollar tree
direct liquid acrylic marker pen dispenser
direct liquid acrylic marker pen directions for use
direct liquid acrylic marker pen dry
direct liquid acrylic marker pen disguises
direct liquid acrylic marker pen depot
direct liquid acrylic marker pens fine line
direct liquid acrylic marker pens fine point
direct liquid acrylic marker pens fine tip
direct liquid acrylic marker pens for painting
direct liquid acrylic marker pen ebay
direct liquid acrylic marker pen ewg
direct liquid acrylic marker pen elite
direct liquid acrylic marker pen elite review
direct liquid acrylic marker pen elite vaporizer
uni acrylic markers
using acrylic markers
what marker to use on acrylic
what type of marker to use on acrylic
what marker works on plastic
where to buy acrylic markers
where to buy acrylic paint pens
are acrylic markers permanent
acrylic marker uses
acrylic marker tutorial
can you use permanent marker on acrylic paint
who acrylic markers
who sells liquitex acrylic paint
are acrylic markers water based
is acrylic liquid monomer
are acrylic pens oil based
are acrylic paint markers waterproof
can you use permanent marker on acrylic
can dry erase markers be used on acrylic
can acrylic markers be used on fabric
acrylic vs oil marker
acrylic markers vs acrylic paint
acrylic paint marker vs oil paint marker
are acrylic paint markers permanent
will dry erase markers work on plexiglass
will permanent marker stick to plastic
will permanent marker stay on silicone
does dry erase markers work on plexiglass
worst acrylic paint
worst acrylic paint brands
worst acrylic powder
art supplies acrylic marker testing comparison
acrylic paint markers permanent
do dry erase markers work on acrylic
do acrylic markers work on fabric
do acrylic paint pens work on plastic
do dry erase markers work on plexiglass
best acrylic markers
best acrylic markers for artists
best acrylic paint pens for plastic
top rated acrylic paint markers
top rated acrylic paint pens
can acrylic paint markers be used on canvas
acrylic marker pens
acrylic marker pen set
acrylic marker pen price
acrylic marker pen uses
acrylic marker pen drawing
acrylic marker pens uk
acrylic marker pen black
acrylic marker pen 24 shades
acrylic marker pen art
acrylic marker pen painting
acrylic marker pens hobbycraft
acrylic marker pen amazon
acrylic marker pen mr diy
acrylic marker pen for fabric
acrylic marker pen how to use
acrylic pen and marker holder
acrylic marker pen himic
acrylic marker pens home bargains
acrylic marker pens hobby lobby
acrylic paint pen hobby lobby
acrylic paint pen holder
acrylic paint pen hobbycraft
acrylic paint pen home depot
acrylic paint pen how to use
acrylic paint pens home bargains
acrylic paint pens hs code
best acrylic paint pens hobbycraft
acrylic marker pen gold
acrylic marker pens guangna
acrylic paint pen gold
acrylic paint pen green
acrylic paint pen grey
acrylic paint pen glass
acrylic paint pen grey set
acrylic paint pen golf ball
acrylic paint pen graffiti
acrylic paint pens grabie
acrylic paint pens glitter
acrylic paint pens guangna
acrylic paint pens gundam
acrylic paint pens gunpla
acrylic paint pen ideas
acrylic paint pen ideas for beginners
acrylic paint pen icon
acrylic paint pens ireland
acrylic paint pens in national bookstore
acrylic paint pens in store
acrylic paint pens india
acrylic paint in pen
acrylic paint pen art ideas
acrylic paint pen drawing ideas
easy acrylic paint pen ideas
which acrylic paint pen is best
acrylic paint pen craft ideas
acrylic paint pens ak interactive
acrylic marker pen flair
acrylic marker pen faber castell
acrylic marker pen fine tip
acrylic marker pen for kids
acrylic marker pen flipkart
acrylic marker pens for glass
acrylic paint pen fine tip
acrylic paint pen for fabric
acrylic paint pen for wood
acrylic paint pen flowers
acrylic paint pen for art and crafts
acrylic paint pen for canvas
acrylic paint pen for glass
acrylic marker pen shopee
acrylic paint pen set
acrylic paint pen storage
acrylic marker sketch pen
acrylic paint pen set uk
acrylic paint pen silver
acrylic paint pen sealer
acrylic paint pen spotlight
acrylic paint pen set nearby
acrylic paint sketch pen
acrylic paint marker pen set
acrylic marker 12 pen set flair brand
acrylic marker pen posca
acrylic marker pen popular
acrylic marker pen peak
acrylic paint pen projects
acrylic paint pen posca
acrylic paint pen price
acrylic paint pen painting
acrylic paint pen pink
acrylic paint pens pna
acrylic paint pens permanent
acrylic paint pens pastel
acrylic paint pens professional
acrylic paint pen joann
acrylic paint pens jumbo
acrylic pen ideas
what pens write on acrylic
do acrylic paint pens work on plastic
what are acrylic paint pens used for
acrylic marker pen ohuhu
acrylic paint pen on fabric
acrylic paint pen on glass
acrylic paint pen officeworks
acrylic paint pen on canvas
acrylic paint pen on wood
acrylic paint pen on plastic
acrylic paint pen on mirror
acrylic paint pen on metal
acrylic paint pen on leather
acrylic paint pen organizer
acrylic paint pen on skin
acrylic paint pen on shirt
acrylic paint pen on ceramic
acrylic marker pen
acrylic marker pen white
acrylic marker pen near me
acrylic paint pens video
acrylic paint pens vs markers
acrylic paint pen vs sharpie
sharpie acrylic paint pens vs posca
acrylic paint pens vs alcohol markers
acrylic pen vs marker
acrylic pen vs permanent marker
acrylic marker vs brush pen
acrylic marker vs color pen
acrylic marker vs gel pen
acrylic paint pens velles
acrylic marker pen under ₹ 100
acrylic marker pen under 200
acrylic marker pen under ₹ 200
acrylic marker pen under ₹ 300
acrylic marker pen under 100
acrylic marker pen under ₹ 400
acrylic paint pen uses
acrylic paint pen ultra fine
acrylic paint pen uk
acrylic paint pens ultra fine tip
acrylic paint pens uk amazon
best acrylic marker pens uk
acrylic marker pens kmart
acrylic paint pen kmart
acrylic paint pen kits
acrylic paint pens kids
acrylic paint pens kuwait
acrylic paint pens nz kmart
acrylic paint dot pens kmart
kokuyo camlin acrylic marker pen
what are acrylic markers used for
does permanent marker stay on acrylic
acrylic paint pens not working
acrylic pens how to use
acrylic marker pen meesho
acrylic marker pen malaysia
acrylic marker pens michaels
acrylic paint pen michaels
acrylic paint pen mont marte
acrylic paint pen metallic
acrylic paint pen molotow
acrylic paint pen mug
acrylic paint pens miniatures
acrylic paint pens mr price
acrylic paint pens medium tip
acrylic paint pens mitre 10
acrylic paint pens michaels nearby
acrylic marker pen quality
acrylic marker pen quiz
acrylic marker pen quick dry
acrylic marker pen qvc
acrylic marker pen quick release
artecho acrylic marker pen
artecho dual tip acrylic marker pen
arrtx acrylic marker pen
acrylic marker vs acrylic paint pen
acrylic paint marker pen amazon
difference between acrylic marker and brush pen
akarued white paint pen acrylic marker
acrylic marker brush pen amazon
marker acrylic pen allegro
acrylic marker and brush pen
is an acrylic marker a paint pen
best acrylic marker pen
black acrylic marker pen
brustro acrylic marker pen
best white acrylic marker pen
baoke acrylic marker pen
acrylic marker brush pen
brush pen vs acrylic marker
acrylic brush marker pen set
acrylic paint marker brush pen
acrylic paint marker pen black
acrylic marker pen box
acrylic paint marker calligraphy brush pen
best acrylic paint marker pen
acrylic marker pens the works
acrylic marker pens the range
acrylic marker pens tesco
acrylic paint pen tooli art
acrylic paint pen tutorial
acrylic paint pen target
acrylic paint pen techniques
acrylic paint pen tips
acrylic paint pen thin
acrylic paint pen thick
acrylic paint pen the range
acrylic paint pen tool art
acrylic paint pen temu
acrylic paint pens the works
acrylic paint pen white
acrylic paint pen walmart
acrylic paint pen water based
acrylic paint pen waterproof
acrylic paint pen warhammer
acrylic paint pen white michaels
acrylic marker with pen
acrylic paint pen washable
acrylic paint pen wood
acrylic paint pens warehouse
acrylic paint pens with brush tip
whsmith acrylic paint pens
acrylic paint pens wholesale
acrylic marker pen refill
acrylic marker pen review
acrylic paint pen refillable
acrylic paint pen reviews
acrylic paint pen removal
acrylic paint pen red
acrylic paint pen reddit
acrylic paint pens range
acrylic paint pens rocks
acrylic paint pens rymans
acrylic paint pens reject shop
acrylic paint pens red dot
acrylic paint pen for resin
acrylic marker pen xl
acrylic marker pen xray
acrylic marker pen xtool
guangna direct liquid soft head acrylic marker pen
gold acrylic marker pen
grasp acrylic marker pen
guangna acrylic marker pen
grabie acrylic marker pen
languo acrylic marker gel pen
acrylic marker brush pen guangna
marker acrylic pen m&g
goffi acrylic paint marker pen
doloha acrylic.marker pen
deli acrylic marker pen
doms acrylic marker pen
dual tip acrylic marker pen
direct liquid soft head acrylic marker pen
direct liquid acrylic marker pen
dual tip acrylic paint pen marker
double.sided acrylic pen marker
double sided acrylic pen marker set of 24
acrylic marker pen diy
dual tip acrylic paint pen marker - 24/48/72 colours
beyond draw-dual-tip-acrylic-paint-pen-marker
fine tip acrylic marker pen
flair acrylic marker pen
marker pen for acrylic painting
marker pen for acrylic board
acrylic marker pen used for
marker pen for acrylic
white marker pen for acrylic
flair acrylic paint marker pen
ohuhu acrylic marker pen for diy
camel acrylic marker pen
carissa acrylic marker pen
camlin acrylic marker pen
acrylic colour marker pen
acrylic marker pen china
acrylic marker pen in chinese
acrylic marker brush pen 80 cores
sharpie creative marker acrylic paint pen
acrylic marker brush pen 60 crore
caneta acrylic marker brush pen
miya acrylic marker pen
metallic acrylic marker pen
led acrylic writing message board night lamp with marker pen
acrylic marker brush pen mercado livre
b&m acrylic marker pens
sketching pens & markers acrylic marker pen
marcadores acrylic marker pen
set acrylic marker pen
soft head acrylic marker pen
silver acrylic marker pen
acrylic marker 12 pen set
acrylic marker pen 48 shades
acrylic marker pen 36 shades
acrylic marker pen 12 shades
pen peak acrylic marker pen
price acrylic marker pen
posca acrylic marker pen
paint acrylic marker pen
acrylic permanent marker pen
wotek acrylic paint marker pen
acrylic paint marker pen white
acrylic marker vs paint pen
thick acrylic marker pen
the acrylic paint marker pen set
acrylic tip marker pen
is an acrylic marker the same as a paint pen
how to use acrylic marker pen
enmy acrylic marker pen
emmy acrylic marker pen
marker acrylic pen empik
what are acrylic markers
are there acrylic paint pens
how to use acrylic markers
nicety acrylic marker pen
acrylic marker pen nearby
acrylic pen marker national bookstore
what markers can you use on acrylic
ohuhu acrylic marker pen
ohuhu acrylic marker pen price
unicorn acrylic marker pen
what is acrylic marker pen
what are acrylic paint markers used for
white acrylic marker pen
@huangaa_3 guangna direct liquid soft head acrylic marker pen
hightune acrylic marker brush pen
liquid soft head acrylic marker pen
how to use acrylic marker
how do acrylic markers work
languo acrylic marker pen
liquid acrylic marker pen
best acrylic marker pens
best acrylic paint marker pens
what are the best acrylic markers
what is the best acrylic paint pens
what are the best acrylic pens
acrylic marker pen ideas
acrylic marker pen price in bangladesh
is sharpie acrylic
acrylic paint pen zeyar
acrylic paint pens new zealand
acrylic marker brush pen zjw
acrylic marker pen zuixua
restly acrylic marker pen
how long do acrylic paint pens last
are acrylic paint pens permanent
are sharpies acrylic
acrylic paint pens lyuvie
acrylic paint pens like posca
acrylic paint pens liquitex
acrylic paint pens life of colour
acrylic paint pens lowes
acrylic paint pens large
acrylic paint pens large set
acrylic paint pens languo
acrylic paint pens for leather shoes
acrylic marker pen best
acrylic marker pen blinkit
acrylic marker pens b&m
acrylic paint pen black
acrylic paint pen brush tip
acrylic paint pen brands
acrylic paint pen brush
acrylic paint pen bunnings
acrylic paint pen brown
acrylic paint pen big w
acrylic paint pen blue
acrylic paint pen by numbers
acrylic marker pen doms
acrylic marker pen deli
acrylic marker pen dual tip
acrylic paint pen drawing
acrylic paint pen drying time
acrylic paint pen designs
acrylic paint pen doodles
acrylic paint pen dry
dual tip acrylic paint.pen
acrylic paint pens dried out
acrylic paint pens desire deluxe
acrylic marker pen camlin
acrylic marker pen camel
acrylic paint pen coloring book
acrylic paint pen crafts
acrylic paint pen case
acrylic paint pen canvas
acrylic marker colour pen
acrylic paint pen car
acrylic paint pen colouring book
acrylic paint pen canada
acrylic paint pens cheap
acrylic paint pens crockd
acrylic marker pen enmy
acrylic marker pens ebay
acrylic paint pen extra fine
acrylic paint pen extra fine tip
acrylic paint pen empty
acrylic paint pens ebay
acrylic paint pens earth tones
acrylic paint pens eckersley
acrylic paint pens enmy
bia acrylic paint pen extra fine tip
sharpie acrylic paint pens earth tones
artistro acrylic paint pens extra fine tip
acrylic paint pens double ended
acrylic marker pens arrtx
acrylic marker pens arrtx simptap
acrylic paint pen art
acrylic paint pen amazon
acrylic paint pen artwork
acrylic paint pens australia
acrylic paint pens argos
acrylic paint pens asda
acrylic paint pens artistro
acrylic paint pens arrtx
acrylic paint pens at michaels

18
data/links.json Normal file
View File

@@ -0,0 +1,18 @@
{
"keyword": "马克笔绘画",
"count": 12,
"links": [
"https://www.tiktok.com/@drawing_board8/video/7569235583214587150",
"https://www.tiktok.com/@seekingartsupplier_my/video/7259632291306032402",
"https://www.tiktok.com/@huangaa_3/video/7522044666745818382",
"https://www.tiktok.com/@acrylicmarkerasmr/video/7470837517571345695",
"https://www.tiktok.com/@fungraffiti/video/7550422681103895863",
"https://www.tiktok.com/@yzd20328cuq/video/7553666411760012574",
"https://www.tiktok.com/@acrylicmarkerasmr/video/7495580197811440926",
"https://www.tiktok.com/@miss.uk3/video/7561345033010416916",
"https://www.tiktok.com/@nashvibes/video/7472932371990416670",
"https://www.tiktok.com/@muse1378/video/7322534704643722539",
"https://www.tiktok.com/@miss.uk3/video/7567258402661960981",
"https://www.tiktok.com/@miss.uk3/video/7569627806678846740"
]
}

25
data/store.py Normal file
View File

@@ -0,0 +1,25 @@
from utils.io import write_json
def save_links_snapshot(path, keywords, items, links):
"""写入链接快照
结构:`{'keywords': list, 'items': list, 'total_count': int, 'links': list}`
"""
write_json(path, {'keywords': list(keywords), 'items': items, 'total_count': len(links), 'links': all_links(links)})
return path
def save_comments_snapshot(path, items):
"""写入评论快照:`{'items': items}`"""
write_json(path, {'items': items})
return path
def all_links(links):
"""将任意可迭代链接转为列表(用于 JSON 序列化)"""
return list(links)
"""数据快照写入工具
职责:
- 将链接搜索的结果按统一结构写入 JSON
- 将评论抓取的结果写入 JSON
仅负责序列化,不包含业务逻辑。
"""

8914177
data/tik_comments.json Normal file

File diff suppressed because one or more lines are too long

39884
data/urls.json Normal file

File diff suppressed because it is too large Load Diff

Binary file not shown.

49
db/mysql_import.py Normal file
View File

@@ -0,0 +1,49 @@
import csv
import os
def import_csv_to_mysql(csv_path, host='localhost', port=3306, user='root', password='', database='crawler_tiktok', table='comments'):
try:
import pymysql
except Exception:
print('missing dependency: pip install pymysql', flush=True)
raise SystemExit(1)
if not os.path.exists(csv_path):
print('csv not found: ' + csv_path, flush=True)
raise SystemExit(1)
conn = pymysql.connect(host=host, port=int(port), user=user, password=password, database=database, charset='utf8mb4')
cur = conn.cursor()
cur.execute(f"CREATE TABLE IF NOT EXISTS `{table}` (\n `id` BIGINT AUTO_INCREMENT PRIMARY KEY,\n `username` VARCHAR(255),\n `text` TEXT\n ) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci")
rows = []
with open(csv_path, 'r', encoding='utf-8', newline='') as f:
r = csv.reader(f)
first = True
for row in r:
if first and row and row[0].lower() == 'username':
first = False
continue
first = False
if not row:
continue
username = row[0] if len(row) > 0 else ''
text = row[1] if len(row) > 1 else ''
rows.append((username, text))
if rows:
cur.executemany(f"INSERT INTO `{table}` (`username`,`text`) VALUES (%s,%s)", rows)
conn.commit()
cur.close()
conn.close()
print(f"inserted={len(rows)}", flush=True)
def create_database_if_not_exists(host='localhost', port=3306, user='root', password='', database='yunque'):
try:
import pymysql
except Exception:
print('missing dependency: pip install pymysql', flush=True)
raise SystemExit(1)
conn = pymysql.connect(host=host, port=int(port), user=user, password=password, charset='utf8mb4')
cur = conn.cursor()
cur.execute(f"CREATE DATABASE IF NOT EXISTS `{database}` CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci")
conn.commit()
cur.close()
conn.close()
print(f"database_ready={database}", flush=True)

185
main.py Normal file
View File

@@ -0,0 +1,185 @@
import argparse
import os
from utils.io import load_keywords_from_file, read_json
from tiktok.search import save_links_multi
from tiktok.comments import save_comments_from_links
from db.mysql_import import import_csv_to_mysql, create_database_if_not_exists
def run_links(args):
"""运行链接收集阶段
参数来源:命令行(关键词、请求文件、分页、并发等)
流程:
1. 汇总关键词(--keyword/--keywords/--keywords-file
2. 校验非空
3. 调用 `save_links_multi` 并发搜索与去重,保存到 `args.out`
"""
kws = []
if args.keyword:
kws.extend([k for k in args.keyword if k])
if args.keywords:
for k in args.keywords.split(','):
k = k.strip()
if k:
kws.append(k)
if args.keywords_file:
kws.extend(load_keywords_from_file(args.keywords_file))
kws = [k for k in kws if k]
if not kws:
raise SystemExit('no keywords')
save_links_multi(kws, out_path=args.out, file_path=args.file_path, max_pages=args.max_pages, timeout=args.timeout, count=args.count, workers=args.workers)
def run_comments(args):
"""运行评论与回复抓取阶段
输入:`args.links_json`(可为统一快照或简单结构)
读取逻辑:优先 `links` 字段;若无,则聚合 `items[*].links`
调用:`save_comments_from_links` 执行并发抓取,输出 JSON 与可选 CSV
"""
obj = read_json(args.links_json)
links = obj.get('links') or []
if not links and os.path.exists(args.links_json):
try:
for name in ['links', 'items']:
if name == 'items':
tmp = []
for it in obj.get('items', []):
tmp.extend(it.get('links', []))
links = tmp
break
except Exception:
pass
if not links:
raise SystemExit('no links')
save_comments_from_links(links, out_path=args.out, file_path=args.file_path, count=args.count, pages=args.pages, timeout=args.timeout, reply_count=args.reply_count, reply_pages=args.reply_pages, total_limit=args.limit, reply_total_limit=args.reply_limit, csv_path=args.csv, workers=args.workers)
def run_all(args):
"""串联执行链接收集与评论抓取
1. 解析关键词并调用搜索阶段输出到 `args.links_out`
2. 读取链接快照,兼容两种结构
3. 调用评论抓取阶段输出到 `args.comments_out` 并可写入 CSV
适用于一体化流水线执行。
"""
kws = []
if getattr(args, 'keyword', None):
kws.extend([k for k in args.keyword if k])
if getattr(args, 'keywords', None):
for k in args.keywords.split(','):
k = k.strip()
if k:
kws.append(k)
if getattr(args, 'keywords_file', None):
kws.extend(load_keywords_from_file(args.keywords_file))
kws = [k for k in kws if k]
if not kws:
raise SystemExit('no keywords')
save_links_multi(kws, out_path=args.links_out, file_path=args.file_path, max_pages=args.search_max_pages, timeout=args.search_timeout, count=args.search_count, workers=args.search_workers)
obj = read_json(args.links_out)
links = obj.get('links') or []
if not links and os.path.exists(args.links_out):
try:
for name in ['links', 'items']:
if name == 'items':
tmp = []
for it in obj.get('items', []):
tmp.extend(it.get('links', []))
links = tmp
break
except Exception:
pass
if not links:
raise SystemExit('no links')
save_comments_from_links(links, out_path=args.comments_out, file_path=args.file_path, count=args.comments_count, pages=args.comments_pages, timeout=args.comments_timeout, reply_count=args.reply_count, reply_pages=args.reply_pages, total_limit=args.comments_limit, reply_total_limit=args.reply_limit, csv_path=args.csv, workers=args.comments_workers)
def main():
"""命令行解析并分发到对应子命令函数"""
p = argparse.ArgumentParser()
sub = p.add_subparsers(dest='cmd')
p_links = sub.add_parser('links')
p_links.add_argument('--keyword', action='append')
p_links.add_argument('--keywords', default=None)
p_links.add_argument('--keywords-file', default=None)
p_links.add_argument('--file-path', default=r'data\1.text')
p_links.add_argument('--out', default='data\\urls.json')
p_links.add_argument('--max-pages', type=int, default=50)
p_links.add_argument('--count', type=int, default=None)
p_links.add_argument('--timeout', type=int, default=30)
p_links.add_argument('--workers', type=int, default=5)
p_links.set_defaults(func=run_links)
p_comments = sub.add_parser('comments')
p_comments.add_argument('--links-json', default='data\\urls.json')
p_comments.add_argument('--out', default='data\\tik_comments.json')
p_comments.add_argument('--file-path', default=r'data\\1.text')
p_comments.add_argument('--count', type=int, default=100)
p_comments.add_argument('--pages', type=int, default=100)
p_comments.add_argument('--timeout', type=int, default=30)
p_comments.add_argument('--limit', type=int, default=None)
p_comments.add_argument('--reply-count', type=int, default=100)
p_comments.add_argument('--reply-pages', type=int, default=100)
p_comments.add_argument('--reply-limit', type=int, default=None)
p_comments.add_argument('--csv', default='data\\comments.csv')
p_comments.add_argument('--workers', type=int, default=None)
p_comments.set_defaults(func=run_comments)
p_all = sub.add_parser('all')
p_all.add_argument('--keyword', action='append')
p_all.add_argument('--keywords', default=None)
p_all.add_argument('--keywords-file', default=None)
p_all.add_argument('--file-path', default=r'data\\1.text')
p_all.add_argument('--links-out', default='data\\urls.json')
p_all.add_argument('--search-max-pages', type=int, default=50)
p_all.add_argument('--search-count', type=int, default=None)
p_all.add_argument('--search-timeout', type=int, default=30)
p_all.add_argument('--search-workers', type=int, default=5)
p_all.add_argument('--comments-out', default='data\\tik_comments.json')
p_all.add_argument('--comments-count', type=int, default=100)
p_all.add_argument('--comments-pages', type=int, default=100)
p_all.add_argument('--comments-timeout', type=int, default=30)
p_all.add_argument('--comments-limit', type=int, default=None)
p_all.add_argument('--reply-count', type=int, default=100)
p_all.add_argument('--reply-pages', type=int, default=100)
p_all.add_argument('--reply-limit', type=int, default=None)
p_all.add_argument('--csv', default='data\\comments.csv')
p_all.add_argument('--comments-workers', type=int, default=None)
p_all.set_defaults(func=run_all)
p_mysql = sub.add_parser('mysql')
p_mysql.add_argument('--csv', default='data\\comments.csv')
p_mysql.add_argument('--host', default='localhost')
p_mysql.add_argument('--port', type=int, default=3306)
p_mysql.add_argument('--user', default='root')
p_mysql.add_argument('--password', default='')
p_mysql.add_argument('--database', default='crawler_tiktok')
p_mysql.add_argument('--table', default='comments')
def run_mysql(args):
import_csv_to_mysql(args.csv, host=args.host, port=args.port, user=args.user, password=args.password, database=args.database, table=args.table)
p_mysql.set_defaults(func=run_mysql)
p_mysql_db = sub.add_parser('mysql-db')
p_mysql_db.add_argument('--host', default='localhost')
p_mysql_db.add_argument('--port', type=int, default=3306)
p_mysql_db.add_argument('--user', default='root')
p_mysql_db.add_argument('--password', default='')
p_mysql_db.add_argument('--database', default='yunque')
def run_mysql_db(args):
create_database_if_not_exists(host=args.host, port=args.port, user=args.user, password=args.password, database=args.database)
p_mysql_db.set_defaults(func=run_mysql_db)
args = p.parse_args()
if not args.cmd:
p.print_help()
raise SystemExit(1)
args.func(args)
if __name__ == '__main__':
main()
"""命令行入口模块
提供三类子命令:
- links根据关键词并发搜索视频链接并保存快照
- comments根据链接列表抓取评论与回复并保存快照与 CSV
- all串联 links 与 comments一次性完成全流程
运行方式建议使用 `python -m crawler_tiktok.main ...` 以避免导入路径问题。
"""

1
query Normal file
View File

@@ -0,0 +1 @@
MySQL80

Binary file not shown.

Binary file not shown.

208
tiktok/comments.py Normal file
View File

@@ -0,0 +1,208 @@
import json
import re
import threading
import time
from urllib.parse import urlparse, parse_qs, urlencode
from urllib.request import Request, urlopen
from core.curl import parse_curl_file
from utils.io import ensure_csv_header, append_csv_rows
from data.store import save_comments_snapshot
def _extract_aweme_id(link):
"""从视频链接中提取 aweme_id/video/<id>"""
m = re.search(r"/video/(\d+)", link)
return m.group(1) if m else None
def fetch_comments_aweme(aweme_id, file_path, count=20, max_pages=50, timeout=30, total_limit=None, referer=None):
"""分页抓取某个视频的评论
参数:
- `aweme_id` 视频 ID
- `file_path` curl 文本文件(第 1 块为评论接口基准)
- `count/max_pages/timeout` 分页与超时控制
- `total_limit` 总条数上限(可选)
- `referer` 用于设置请求头的来源页(可选)
行为:失败重试、必要时切换到兜底评论接口;处理 `has_more/next_cursor`。
返回:评论对象列表。
"""
reqs = parse_curl_file(file_path)
if not reqs:
return []
base = reqs[0]
headers = dict(base['headers'])
if referer:
headers['referer'] = referer
cursor = 0
all_comments = []
for _ in range(max_pages):
u_parsed = urlparse(base['url'])
q = parse_qs(u_parsed.query)
q['aweme_id'] = [str(aweme_id)]
q['count'] = [str(count)]
q['cursor'] = [str(cursor)]
u = u_parsed._replace(query=urlencode(q, doseq=True)).geturl()
data = None
for i in range(3):
try:
req = Request(u, headers=headers, method='GET')
with urlopen(req, timeout=timeout) as resp:
data = resp.read()
break
except Exception:
time.sleep(0.5 * (i + 1))
data = None
try:
obj = json.loads(data.decode('utf-8', errors='ignore'))
except Exception:
obj = {}
if not obj.get('comments'):
alt_params = {'aid': 1988, 'aweme_id': aweme_id, 'count': count, 'cursor': cursor}
alt_url = 'https://www.tiktok.com/api/comment/list/?' + urlencode(alt_params)
for i in range(2):
try:
req = Request(alt_url, headers=headers, method='GET')
with urlopen(req, timeout=timeout) as resp:
data2 = resp.read()
obj2 = json.loads(data2.decode('utf-8', errors='ignore'))
if obj2.get('comments'):
obj = obj2
break
except Exception:
time.sleep(0.5 * (i + 1))
comments = obj.get('comments') or []
for c in comments:
all_comments.append(c)
if isinstance(total_limit, int) and total_limit > 0 and len(all_comments) >= total_limit:
break
has_more = obj.get('has_more')
next_cursor = obj.get('cursor') or obj.get('next_cursor')
if has_more in (True, 1) and isinstance(next_cursor, int):
cursor = next_cursor
continue
if comments and isinstance(next_cursor, int):
cursor = next_cursor
continue
break
return all_comments
def fetch_replies(comment_id, aweme_id, file_path, count=20, max_pages=50, timeout=30, total_limit=None):
"""分页抓取某条评论的二级回复
参数:`comment_id/aweme_id` 标识;其他参数同评论抓取。
返回:回复对象列表。
"""
reqs = parse_curl_file(file_path)
if not reqs:
return []
headers = reqs[0]['headers']
base = 'https://www.tiktok.com/api/comment/list/reply/'
cursor = 0
replies = []
for _ in range(max_pages):
params = {'aid': 1988, 'aweme_id': aweme_id, 'comment_id': comment_id, 'count': count, 'cursor': cursor}
url = base + '?' + urlencode(params)
data = None
for i in range(3):
try:
req = Request(url, headers=headers, method='GET')
with urlopen(req, timeout=timeout) as resp:
data = resp.read()
break
except Exception:
time.sleep(0.5 * (i + 1))
data = None
try:
obj = json.loads(data.decode('utf-8', errors='ignore'))
except Exception:
obj = {}
arr = obj.get('comments') or []
for r in arr:
replies.append(r)
if isinstance(total_limit, int) and total_limit > 0 and len(replies) >= total_limit:
break
has_more = obj.get('has_more')
next_cursor = obj.get('cursor')
if has_more in (True, 1) and isinstance(next_cursor, int):
cursor = next_cursor
continue
break
return replies
_csv_lock = threading.Lock()
_print_lock = threading.Lock()
_results_lock = threading.Lock()
def save_comments_from_links(links, out_path, file_path, count=20, pages=50, timeout=30, reply_count=20, reply_pages=50, total_limit=None, reply_total_limit=None, csv_path=None, workers=None):
"""并发从视频链接抓取评论与回复并保存快照
并发:可选信号量限制;每个链接独立线程抓取;
CSV若提供 `csv_path`,按 `username,text` 追加主评论与回复;
输出:写入 `out_path`,结构为 `{'items': [{link,count,comments: [...]}, ...]}`。
"""
ensure_csv_header(csv_path, ['username', 'text'])
results = []
sem = None
if isinstance(workers, int) and workers > 0:
sem = threading.Semaphore(workers)
def _process(link):
if sem:
sem.acquire()
with _print_lock:
print(f"[START] {link}", flush=True)
try:
cs = fetch_comments_aweme(_extract_aweme_id(link), file_path=file_path, count=count, max_pages=pages, timeout=timeout, total_limit=total_limit, referer=link)
enriched = []
for c in cs:
cid = c.get('cid')
if cid:
rs = fetch_replies(cid, _extract_aweme_id(link), file_path=file_path, count=reply_count, max_pages=reply_pages, timeout=timeout, total_limit=reply_total_limit)
c = dict(c)
c['replies'] = rs
c['reply_count'] = len(rs)
enriched.append(c)
try:
with _print_lock:
print(f"{link} | cid={c.get('cid')} | create_time={c.get('create_time')} | reply_count={c.get('reply_count', 0)} | text={c.get('text')}", flush=True)
except Exception:
pass
if csv_path:
u = c.get('user') or {}
uname = u.get('unique_id') or u.get('nickname') or u.get('uid') or ''
rows = [[uname, c.get('text')]]
for r in c.get('replies', []) or []:
ru = r.get('user') or {}
runame = ru.get('unique_id') or ru.get('nickname') or ru.get('uid') or ''
rows.append([runame, r.get('text')])
with _csv_lock:
append_csv_rows(csv_path, rows)
with _results_lock:
results.append({'link': link, 'count': len(cs), 'comments': enriched})
reply_total = sum(len(c.get('replies') or []) for c in enriched)
with _print_lock:
print(f"[DONE] {link} comments={len(cs)} replies={reply_total}", flush=True)
except Exception as e:
with _print_lock:
print(f"[ERROR] {link} {e}", flush=True)
finally:
if sem:
sem.release()
threads = []
for link in links:
t = threading.Thread(target=_process, args=(link,))
t.daemon = True
t.start()
threads.append(t)
for t in threads:
t.join()
save_comments_snapshot(out_path, results)
return out_path
"""TikTok 评论与回复抓取模块
能力:
- 根据视频链接提取 aweme_id
- 通过评论接口分页拉取评论(支持兜底接口)
- 针对每条评论抓取二级回复并汇总
- 可选写入 CSV 与打印进度日志
"""

171
tiktok/search.py Normal file
View File

@@ -0,0 +1,171 @@
import json
import re
import threading
import time
from urllib.parse import urlparse, parse_qs, urlencode, urlunparse
from urllib.request import Request, urlopen
from core.curl import parse_curl_file
from data.store import save_links_snapshot
def _update_query(url, updates):
"""在原始 URL 上用 `updates` 更新查询参数并返回新 URL"""
p = urlparse(url)
q = parse_qs(p.query)
for k, v in updates.items():
q[k] = [str(v)]
new_q = urlencode(q, doseq=True)
return urlunparse((p.scheme, p.netloc, p.path, p.params, new_q, p.fragment))
def _extract_links(obj):
"""从返回对象中提取视频链接
优先从 `data -> item -> author.uniqueId + item.id` 组合;
同时遍历字符串字段,用正则匹配 tiktok 链接作为兜底。
返回:链接列表(可能包含重复,外层负责去重)。
"""
links = []
data = obj.get('data') if isinstance(obj, dict) else None
if isinstance(data, list):
for e in data:
if isinstance(e, dict) and e.get('type') == 1 and isinstance(e.get('item'), dict):
it = e['item']
author = it.get('author') or {}
uid = author.get('uniqueId')
vid = it.get('id')
if uid and vid:
links.append(f"https://www.tiktok.com/@{uid}/video/{vid}")
patterns = [
r"https?://www\.tiktok\.com/[\w@._-]+/video/\d+",
r"https?://www\.tiktok\.com/video/\d+",
r"https?://vm\.tiktok\.com/[\w-]+",
r"https?://vt\.tiktok\.com/[\w-]+",
]
def rec(x):
if isinstance(x, dict):
for v in x.values():
rec(v)
elif isinstance(x, list):
for v in x:
rec(v)
elif isinstance(x, str):
s = x
for pat in patterns:
for m in re.finditer(pat, s):
links.append(m.group(0))
rec(obj)
return links
def search_video_links(keyword, file_path, max_pages=50, timeout=30, count=None, on_link=None):
"""按关键词分页搜索视频链接
输入:从 `file_path` 的第 2 个 curl 请求获取基准 URL 与头部
行为:分页拉取、重试、解析链接;对新链接触发 `on_link` 回调
返回:所有发现的链接列表(不去重本地返回,外层统一去重)。
"""
reqs = parse_curl_file(file_path)
if len(reqs) < 2:
return []
base = reqs[1]
headers = base['headers']
parsed = urlparse(base['url'])
q = parse_qs(parsed.query)
if count is None:
if 'count' in q:
try:
count = int(q['count'][0])
except Exception:
count = 12
else:
count = 12
all_links = []
seen = set()
offset = 0
cursor = None
for _ in range(max_pages):
params = {'keyword': keyword, 'count': count}
if cursor is not None:
params['offset'] = cursor
else:
params['offset'] = offset
u = _update_query(base['url'], params)
data = None
for i in range(3):
try:
req = Request(u, headers=headers, method='GET')
with urlopen(req, timeout=timeout) as resp:
data = resp.read()
break
except Exception:
time.sleep(0.5 * (i + 1))
data = None
try:
obj = json.loads(data.decode('utf-8', errors='ignore'))
except Exception:
obj = {}
links = _extract_links(obj)
has_more = obj.get('has_more')
next_cursor = obj.get('cursor')
new = 0
for l in links:
if l not in seen:
seen.add(l)
all_links.append(l)
new += 1
if on_link:
try:
on_link(l)
except Exception:
pass
if has_more in (True, 1) and isinstance(next_cursor, int):
cursor = next_cursor
continue
if new == 0:
break
offset += count
return all_links
_print_lock = threading.Lock()
def save_links_multi(keywords, out_path, file_path, max_pages=50, timeout=30, count=None, workers=5):
"""并发按多个关键词搜索并保存快照
并发:使用线程 + 信号量限制并发;跨关键词统一去重;
输出:写入 `out_path`,包含 `keywords/items/total_count/links`。
"""
all_links = []
seen = set()
items = []
seen_lock = threading.Lock()
sem = threading.Semaphore(max(1, int(workers)))
def worker(kw):
with sem:
item_links = []
def on_new(l):
with seen_lock:
if l not in seen:
seen.add(l)
all_links.append(l)
item_links.append(l)
with _print_lock:
print(l, flush=True)
search_video_links(kw, file_path=file_path, max_pages=max_pages, timeout=timeout, count=count, on_link=on_new)
items.append({'keyword': kw, 'count': len(item_links), 'links': item_links})
threads = []
for kw in keywords:
t = threading.Thread(target=worker, args=(kw,))
t.daemon = True
t.start()
threads.append(t)
for t in threads:
t.join()
save_links_snapshot(out_path, keywords, items, all_links)
return out_path
"""TikTok 视频链接搜索模块
核心能力:
- 构造查询 URL更新 keyword/offset/count 等参数)
- 发起请求并解析返回中的视频链接(结构化 + 正则兜底)
- 对多个关键词并发搜索、统一去重与快照保存
"""

Binary file not shown.

55
utils/filter_comments.py Normal file
View File

@@ -0,0 +1,55 @@
import argparse
import csv
import os
def filter_comments(csv_in, csv_out, keywords):
ks = set(k.lower() for k in keywords if k)
rows_out = []
with open(csv_in, 'r', encoding='utf-8', newline='') as f:
r = csv.reader(f)
first = True
for row in r:
if first and row and row[0].lower() == 'username':
first = False
continue
first = False
if not row:
continue
text = row[1] if len(row) > 1 else ''
s = (text or '').lower()
if any(k in s for k in ks):
rows_out.append(row)
os.makedirs(os.path.dirname(csv_out), exist_ok=True)
with open(csv_out, 'w', encoding='utf-8', newline='') as wf:
w = csv.writer(wf)
w.writerow(['username', 'text'])
for r in rows_out:
w.writerow(r)
print(f"input={csv_in} keywords={len(ks)} matched_rows={len(rows_out)} out={csv_out}")
def main():
p = argparse.ArgumentParser()
p.add_argument('--extern-keywords', default=r'd:\work\test\test\all_keywords.txt')
p.add_argument('--local-keywords', default=r'data\keyword.txt')
p.add_argument('--csv-in', default=r'data\comments.csv')
p.add_argument('--csv-out', default=r'data\key_comment.csv')
args = p.parse_args()
def _load(path):
arr = []
try:
with open(path, 'r', encoding='utf-8') as f:
for line in f:
s = line.strip()
if s:
arr.append(s)
except Exception:
arr = []
return arr
kws = []
kws.extend(_load(args.extern_keywords))
kws.extend(_load(args.local_keywords))
kws.append('pen')
filter_comments(args.csv_in, args.csv_out, kws)
if __name__ == '__main__':
main()

54
utils/io.py Normal file
View File

@@ -0,0 +1,54 @@
import json
import os
import csv
def load_keywords_from_file(path):
"""逐行读取关键词文件,忽略空行,返回列表"""
arr = []
try:
with open(path, 'r', encoding='utf-8') as f:
for line in f:
s = line.strip()
if s:
arr.append(s)
except Exception:
arr = []
return arr
def write_json(path, obj):
"""以 UTF-8 写入 JSON使用非 ASCII 保留与缩进"""
with open(path, 'w', encoding='utf-8') as f:
json.dump(obj, f, ensure_ascii=False, indent=2)
def read_json(path):
"""读取 JSON 文件,失败时返回空对象"""
try:
with open(path, 'r', encoding='utf-8') as f:
return json.load(f)
except Exception:
return {}
def ensure_csv_header(path, headers):
"""若 CSV 不存在则创建并写入表头;为空路径直接返回"""
if not path:
return
if not os.path.exists(path):
with open(path, 'w', newline='', encoding='utf-8') as wf:
w = csv.writer(wf)
w.writerow(headers)
def append_csv_rows(path, rows):
"""向 CSV 追加多行,行元素按列表给出;为空路径直接返回"""
if not path:
return
with open(path, 'a', newline='', encoding='utf-8') as af:
w = csv.writer(af)
for r in rows:
w.writerow(r)
"""通用 IO 工具
提供:
- 关键词文件加载
- JSON 读写
- CSV 文件写入(确保表头、追加行)
"""