init commit
This commit is contained in:
194
README.md
Normal file
194
README.md
Normal file
@@ -0,0 +1,194 @@
|
||||
# crawler_tiktok
|
||||
|
||||
TikTok 数据抓取脚本,分为两阶段:
|
||||
- 按关键词搜索视频链接并生成快照(`links`)
|
||||
- 根据视频链接抓取评论与二级回复并生成快照与可选 CSV(`comments`)
|
||||
|
||||
全项目基于 Python 标准库实现(`urllib`、`threading` 等),不依赖第三方包。
|
||||
|
||||
## 目录结构
|
||||
|
||||
```
|
||||
crawler_tiktok/
|
||||
├─ core/ # cURL 文本解析与请求发送
|
||||
│ └─ curl.py
|
||||
├─ tiktok/ # TikTok 业务逻辑
|
||||
│ ├─ search.py # 关键词搜索视频链接
|
||||
│ └─ comments.py # 抓取评论与二级回复
|
||||
├─ data/ # 示例数据与输出目录
|
||||
│ ├─ 1.text # cURL 文本(包含多个 curl 命令块)
|
||||
│ ├─ keyword.txt # 关键词文件(每行一个关键词)
|
||||
│ ├─ urls.json # 链接搜索快照输出(示例已有)
|
||||
│ ├─ comments.csv # 评论 CSV 输出(可选)
|
||||
│ └─ store.py # 统一的快照写入工具
|
||||
├─ utils/ # 通用 IO 工具
|
||||
│ └─ io.py
|
||||
├─ main.py # 命令行入口(子命令:links / comments / all)
|
||||
└─ __init__.py # 包入口
|
||||
```
|
||||
|
||||
## 准备工作
|
||||
|
||||
- 安装 Python(建议 3.8+)
|
||||
- 准备 `data/1.text`:
|
||||
- 打开浏览器访问 TikTok,登录后在开发者工具的 Network 面板选中相关请求,使用 “Copy as cURL” 复制。
|
||||
- 将“评论接口”的 `curl ...` 放在第一段,“搜索接口”的 `curl ...` 放在第二段;两段之间可直接换行即可。
|
||||
- 保留请求头(尤其是 `cookie`)以便接口正常返回。
|
||||
- 准备关键词文件 `data/keyword.txt`(每行一个关键词),或使用命令行传参。
|
||||
|
||||
## 快速开始
|
||||
|
||||
在仓库根目录(必须为 `D:\work\crawler_tiktok`)直接运行脚本:
|
||||
|
||||
```
|
||||
python main.py -h
|
||||
```
|
||||
|
||||
|
||||
### 1) 搜索视频链接(links)
|
||||
|
||||
将关键词并发搜索,统一去重并保存到 `urls.json`。
|
||||
|
||||
```
|
||||
python main.py links \
|
||||
--keywords-file data\keyword.txt \
|
||||
--file-path data\1.text \
|
||||
--out data\urls.json \
|
||||
--max-pages 50 \
|
||||
--count 12 \
|
||||
--timeout 30 \
|
||||
--workers 5
|
||||
```
|
||||
|
||||
可选:
|
||||
- 通过 `--keyword` 重复传入多个关键词(可与 `--keywords-file` 混用)
|
||||
- `--keywords` 逗号分隔的关键词字符串
|
||||
|
||||
输出 `urls.json` 结构示例:
|
||||
|
||||
```json
|
||||
{
|
||||
"keywords": ["xxx", "yyy"],
|
||||
"items": [
|
||||
{"keyword": "xxx", "count": 10, "links": ["https://www.tiktok.com/@user/video/123" ...]},
|
||||
{"keyword": "yyy", "count": 8, "links": [ ... ]}
|
||||
],
|
||||
"total_count": 17855,
|
||||
"links": ["https://www.tiktok.com/@user/video/123", ...]
|
||||
}
|
||||
```
|
||||
|
||||
### 2) 抓取评论与回复(comments)
|
||||
|
||||
从链接快照读取链接,抓取主评论与二级回复,并保存 JSON 与可选 CSV。
|
||||
|
||||
```
|
||||
python main.py comments \
|
||||
--links-json data\urls.json \
|
||||
--out data\tik_comments.json \
|
||||
--file-path data\1.text \
|
||||
--count 100 \
|
||||
--pages 100 \
|
||||
--timeout 30 \
|
||||
--reply-count 100 \
|
||||
--reply-pages 100 \
|
||||
--csv data\comments.csv \
|
||||
--workers 8
|
||||
```
|
||||
|
||||
输出 `tik_comments.json` 结构示例:
|
||||
|
||||
```json
|
||||
{
|
||||
"items": [
|
||||
{
|
||||
"link": "https://www.tiktok.com/@user/video/123",
|
||||
"count": 42,
|
||||
"comments": [
|
||||
{
|
||||
"cid": "xxx",
|
||||
"text": "...",
|
||||
"user": {"unique_id": "..."},
|
||||
"replies": [{"text": "..."}, ...],
|
||||
"reply_count": 3
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
若提供 `--csv`,会将主评论与回复分别以 `username,text` 形式追加到该文件。
|
||||
|
||||
### 3) 全流程一体化(all)
|
||||
|
||||
一次性串联链接搜索与评论抓取,适合流水线执行:
|
||||
|
||||
```
|
||||
python main.py all \
|
||||
--keywords-file data\keyword.txt \
|
||||
--file-path data\1.text \
|
||||
--links-out data\urls.json \
|
||||
--search-max-pages 50 \
|
||||
--search-count 12 \
|
||||
--search-timeout 30 \
|
||||
--search-workers 5 \
|
||||
--comments-out data\tik_comments.json \
|
||||
--comments-count 100 \
|
||||
--comments-pages 100 \
|
||||
--comments-timeout 30 \
|
||||
--comments-limit 1000 \
|
||||
--reply-count 100 \
|
||||
--reply-pages 100 \
|
||||
--reply-limit 2000 \
|
||||
--csv data\comments.csv \
|
||||
--comments-workers 8
|
||||
```
|
||||
|
||||
### 4) 写入 MySQL(从 CSV 导入)
|
||||
|
||||
在 `D:\work\crawler_tiktok` 下执行:
|
||||
|
||||
```
|
||||
pip install pymysql
|
||||
python main.py mysql \
|
||||
--csv data\comments.csv \
|
||||
--host localhost \
|
||||
--port 3306 \
|
||||
--user root \
|
||||
--password <你的密码> \
|
||||
--database crawler_tiktok \
|
||||
--table comments
|
||||
```
|
||||
|
||||
若数据库不存在,请先在 MySQL 中创建:
|
||||
|
||||
```
|
||||
CREATE DATABASE IF NOT EXISTS `crawler_tiktok` DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
|
||||
```
|
||||
|
||||
导入会在指定库中自动建表(如不存在),并批量插入 `username,text` 两列数据。
|
||||
|
||||
## 重要参数说明
|
||||
|
||||
- `--keyword / --keywords / --keywords-file`:三种方式提供关键词,最终会合并并去重。
|
||||
- `--file-path`:cURL 文本文件路径(包含多个 `curl ...` 命令块)。
|
||||
- 第 1 块作为评论接口基准。
|
||||
- 第 2 块作为搜索接口基准。
|
||||
- 搜索阶段:`--max-pages` 分页轮次上限;`--count` 每页条数(默认从 URL 中推断,通常为 12);`--workers` 并发线程数。
|
||||
- 评论阶段:`--pages` 评论分页上限;`--count` 每页评论数;`--reply-count` / `--reply-pages` 回复分页与每页数;`--workers` 并发抓取线程数。
|
||||
- `--timeout`:请求超时秒数。
|
||||
- `--csv`:若提供则会将主评论与回复按 `username,text` 追加到该 CSV。
|
||||
|
||||
## 输出文件约定
|
||||
|
||||
- `data/urls.json`:链接搜索快照,包含 `keywords/items/total_count/links`。
|
||||
- `data/tik_comments.json`:评论抓取快照,包含 `items`(每项含 `link/count/comments`)。
|
||||
- `data/comments.csv`:CSV 格式的评论与回复(用户名、文本)。
|
||||
|
||||
## 常见问题
|
||||
|
||||
- 返回为空或报错:优先检查 `data/1.text` 的 cURL 是否有效,`cookie` 是否过期。
|
||||
- 速率限制:适当降低 `--workers`、提高 `--timeout`,或分批执行。
|
||||
- Windows 路径:示例中使用反斜杠;若在类 Unix 环境,改为 `/`。
|
||||
- 进度打印:抓取过程会打印 START/DONE/ERROR 以及评论统计,便于观察执行状态。
|
||||
10
__init__.py
Normal file
10
__init__.py
Normal file
@@ -0,0 +1,10 @@
|
||||
"""crawler_tiktok 包
|
||||
|
||||
该包用于从 TikTok 搜集视频链接并抓取评论与回复。
|
||||
模块划分:
|
||||
- core:基础能力(如从 curl 文本解析 URL 与请求头)
|
||||
- tiktok:与 TikTok 相关的抓取逻辑(搜索、评论)
|
||||
- utils:通用 IO 工具(JSON/CSV 读写、关键词文件加载)
|
||||
- data:数据快照的写入工具(store),以及示例数据文件
|
||||
入口:main.py 提供命令行子命令 links/comments/all。
|
||||
"""
|
||||
BIN
__pycache__/__init__.cpython-312.pyc
Normal file
BIN
__pycache__/__init__.cpython-312.pyc
Normal file
Binary file not shown.
BIN
__pycache__/main.cpython-312.pyc
Normal file
BIN
__pycache__/main.cpython-312.pyc
Normal file
Binary file not shown.
BIN
core/__pycache__/curl.cpython-312.pyc
Normal file
BIN
core/__pycache__/curl.cpython-312.pyc
Normal file
Binary file not shown.
BIN
core/__pycache__/har.cpython-312.pyc
Normal file
BIN
core/__pycache__/har.cpython-312.pyc
Normal file
Binary file not shown.
BIN
core/__pycache__/store.cpython-312.pyc
Normal file
BIN
core/__pycache__/store.cpython-312.pyc
Normal file
Binary file not shown.
74
core/curl.py
Normal file
74
core/curl.py
Normal file
@@ -0,0 +1,74 @@
|
||||
import re
|
||||
import json
|
||||
from urllib.request import Request, urlopen
|
||||
|
||||
def _split_curl_blocks(text):
|
||||
"""按出现的 `curl ` 关键字切分文本为多个命令块"""
|
||||
blocks = []
|
||||
indices = [m.start() for m in re.finditer(r"\bcurl\s", text)]
|
||||
if not indices:
|
||||
return blocks
|
||||
for i, start in enumerate(indices):
|
||||
end = indices[i + 1] if i + 1 < len(indices) else len(text)
|
||||
blocks.append(text[start:end])
|
||||
return blocks
|
||||
|
||||
def _parse_block(block):
|
||||
"""从单个 curl 命令块中解析 URL 与头部
|
||||
|
||||
返回:`{'url': str, 'headers': dict}`,若无法解析 URL 返回 None
|
||||
"""
|
||||
url_m = re.search(r"curl\s+['\"](.*?)['\"]", block, re.S)
|
||||
if not url_m:
|
||||
return None
|
||||
url = url_m.group(1)
|
||||
headers = {}
|
||||
for hm in re.finditer(r"-H\s+['\"]([^:]+):\s*(.*?)['\"]", block):
|
||||
k = hm.group(1).strip()
|
||||
v = hm.group(2).strip()
|
||||
headers[k.lower()] = v
|
||||
cm = re.search(r"-b\s+['\"](.*?)['\"]", block, re.S)
|
||||
if cm:
|
||||
headers['cookie'] = cm.group(1)
|
||||
return {'url': url, 'headers': headers}
|
||||
|
||||
def parse_curl_file(file_path):
|
||||
"""读取 curl 文本文件并解析为请求描述列表
|
||||
|
||||
参数:`file_path` 文件路径
|
||||
返回:列表,每项包含 `url` 与 `headers`
|
||||
"""
|
||||
with open(file_path, 'r', encoding='utf-8') as f:
|
||||
text = f.read()
|
||||
blocks = _split_curl_blocks(text)
|
||||
result = []
|
||||
for b in blocks:
|
||||
parsed = _parse_block(b)
|
||||
if parsed:
|
||||
result.append(parsed)
|
||||
return result
|
||||
|
||||
def fetch_from_curl(file_path, index=0, timeout=30):
|
||||
"""按索引选取解析出的请求并发起 GET
|
||||
|
||||
参数:`index` 为第几个 curl 块;`timeout` 请求超时秒数
|
||||
返回:尝试解析为 JSON,失败则返回原始 bytes
|
||||
"""
|
||||
reqs = parse_curl_file(file_path)
|
||||
if not reqs or index < 0 or index >= len(reqs):
|
||||
return None
|
||||
item = reqs[index]
|
||||
req = Request(item['url'], headers=item['headers'], method='GET')
|
||||
with urlopen(req, timeout=timeout) as resp:
|
||||
data = resp.read()
|
||||
try:
|
||||
return json.loads(data.decode('utf-8', errors='ignore'))
|
||||
except Exception:
|
||||
return data
|
||||
"""curl 文本解析与请求发送工具
|
||||
|
||||
职责:
|
||||
- 将包含多个 curl 命令的文本切分为块
|
||||
- 从每个块解析 URL 与请求头(含 Cookie)
|
||||
- 基于解析结果发起 GET 请求并尝试返回 JSON
|
||||
"""
|
||||
28
data/1.text
Normal file
28
data/1.text
Normal file
@@ -0,0 +1,28 @@
|
||||
curl 'https://www.tiktok.com/api/comment/list/?WebIdLastTime=1762843898&aid=1988&app_language=en&app_name=tiktok_web&aweme_id=7554313767425985806&browser_language=zh-CN&browser_name=Mozilla&browser_online=true&browser_platform=Win32&browser_version=5.0%20%28Windows%20NT%2010.0%3B%20Win64%3B%20x64%29%20AppleWebKit%2F537.36%20%28KHTML%2C%20like%20Gecko%29%20Chrome%2F142.0.0.0%20Safari%2F537.36&channel=tiktok_web&cookie_enabled=true&count=20&cursor=0&data_collection_enabled=true&device_id=7571356851431851534&device_platform=web_pc&focus_state=true&from_page=video&history_len=4&is_fullscreen=false&is_page_visible=true&odinId=7571357724144124941&os=windows&priority_region=US&referer=https%3A%2F%2Fwww.google.com%2F®ion=US&root_referer=https%3A%2F%2Fwww.google.com%2F&screen_height=1080&screen_width=1920&tz_name=Asia%2FShanghai&user_is_login=true&verifyFp=verify_mi5rex4u_QQ3WUuF2_qkrc_4zFV_9lPB_6ZiBsQO9Yg1Y&webcast_language=en&msToken=HGxJ50IHbcJxidsAy4biksW4jCfUpjG5IOfoNZd9m24WyL0muFz7f02qUT2A4HCKPQPheRtCr66460XMCJQ9mCplXR1zk1fKK81mU65TLKdczaVqauDay_1qAol348Cg_iQaiK74qPZ0EoJBKu_iZbCATA==&X-Bogus=DFSzsIVuMqxANCCACObDM-ZLJrPe&X-Gnarly=MxLz6L1B1jRWAkdMAhXfoRcZW7o7E89jQUXQepysu7jSC47hCDAgLaFj6ATg13br-ct2WppjvVuo3DrB5foJoo3XOjJH6TVzfVkPLs8Sw47ja/0uC5DB6DPtfPWekO9g9-ZviZeREnpG/N2SRXbqDr0-Go5o0OzoRp9wdRUoLSAM5nbo0niphLjDxyOzdsW/RqxqQNFbBnJJkNqIH4TiXbQmNafqX1Yk5cGaIFH7FHcjZsYRbtA2gc8cXePp2guxR5cXDepaBF2Wgsdmu8VM2q8ed8o7ohQXi56GiUuUXjLJJ130mlhYHJHgYHtxYynbeRw=' \
|
||||
-H 'accept: */*' \
|
||||
-H 'accept-language: zh-CN,zh;q=0.9' \
|
||||
-b 'passport_csrf_token=fc48fa188f4d67baad733476f64baccf; passport_csrf_token_default=fc48fa188f4d67baad733476f64baccf; last_login_method=email; delay_guest_mode_vid=5; living_user_id=480687482446; tiktok_webapp_theme_source=auto; tt_chain_token=tTxp2ztaIaxTaXsGrJb+8Q==; tiktok_webapp_lang=zh-Hans; d_ticket=64bffcf490d4c9c7839c94bfa06f09cf36a43; ttwid=1%7C28VHivouaKIrGON9d-ZudJgYZUKezdC-9xuqts9saLQ%7C1763362651%7C2cd9c33c1a5733387a276a04d1af0af2026a80003217b55a841cb13520f1347f; myCookie=rap; fblo_1862952583919182=y; tiktok_webapp_theme=light; multi_sids=7571357724144124941%3Ab0685b23eb2eeb5f5e0a5604801f365b; cmpl_token=AgQQAPNSF-RO0rksMxu3N50083NFwqLP_4zZYKPv8w; sid_guard=b0685b23eb2eeb5f5e0a5604801f365b%7C1763435980%7C15552000%7CSun%2C+17-May-2026+03%3A19%3A40+GMT; uid_tt=b8d8d5d8b062bd2f3badb4e51420dc711f3305fec2a664f03306a64342912e7e; uid_tt_ss=b8d8d5d8b062bd2f3badb4e51420dc711f3305fec2a664f03306a64342912e7e; sid_tt=b0685b23eb2eeb5f5e0a5604801f365b; sessionid=b0685b23eb2eeb5f5e0a5604801f365b; sessionid_ss=b0685b23eb2eeb5f5e0a5604801f365b; tt_session_tlb_tag=sttt%7C1%7CsGhbI-su619eClYEgB82W__________pPerzW7vBXrU5jjfVhTeyC35OugALhjpAGeyHj7inp30%3D; sid_ucp_v1=1.0.1-KDVjMGU0Y2Q1OTg2ZjI0ODI2MjI2NWFmZDY0YWYyMmE0ZTkzY2M5M2QKIgiNiJX4w7e3iWkQzMvvyAYYswsgDDDgu8vIBjgBQOoHSAQQBBoHdXNlYXN0NSIgYjA2ODViMjNlYjJlZWI1ZjVlMGE1NjA0ODAxZjM2NWIyTgogV1B2-bFMjyKUv3NQtyvtMGmk7ui-Sl9aVTQStE9c4gkSIMJacGh0QuV1QyQEcXmFWpSTFvdWSbfDoe3vFgrRtb2YGAEiBnRpa3Rvaw; ssid_ucp_v1=1.0.1-KDVjMGU0Y2Q1OTg2ZjI0ODI2MjI2NWFmZDY0YWYyMmE0ZTkzY2M5M2QKIgiNiJX4w7e3iWkQzMvvyAYYswsgDDDgu8vIBjgBQOoHSAQQBBoHdXNlYXN0NSIgYjA2ODViMjNlYjJlZWI1ZjVlMGE1NjA0ODAxZjM2NWIyTgogV1B2-bFMjyKUv3NQtyvtMGmk7ui-Sl9aVTQStE9c4gkSIMJacGh0QuV1QyQEcXmFWpSTFvdWSbfDoe3vFgrRtb2YGAEiBnRpa3Rvaw; store-idc=useast5; store-country-code=us; store-country-code-src=uid; tt-target-idc=useast8; tt-target-idc-sign=jh5EgVgJzs2Zhfd2qcTVn1799rd4vK_dkjNT5E1hkY8ey6Iuuuo2qPfonmOJN-73-SEUPvtKH8L04xVHFdfuxGD4jEaT9iJGdE634_1n9RDR1aV8X7xR36LWBtgYwCfK96M28ozQElXgVDFsHS5jNIH0Jfq5gaisBGFWCAz7zEHd3YwWFjSVW96udWsQjGHM_y0UYLKmGEwmNh3nCmKOGntgfvFHrzuxfYL2T6upJ8x8WMb3GG-tGXKw4N05kaWH3LJCY3hGiCOIBX3s8_n0jvn1PLu9yiOUiF2f-K63HqcnVxPsuYfg8iYE47R08TALOuvQE9CFArejv2TMFIjiGKi-MB3BlWwwCUmUSbTyoTLBzSuKT5Elynh0l1JVMxrXlqZn39OMcs8_AB0n_RyyAF3FH9pV0sQ2lza6iJtZim0hmnAuTn8C26lyuss0BP3vlSPtp26rNMyqs1uZlpIclUzI7hmlBV6HNph2l1oBp7QMbvCQLFIGYMJXTL5kP_kX; tt_csrf_token=2alNSmcj-CbgYZfsy7TgGoLanfvaF_S5Ya4M; passport_fe_beating_status=true; ttwid=1%7C28VHivouaKIrGON9d-ZudJgYZUKezdC-9xuqts9saLQ%7C1763542066%7C310b419382cc569f618a86d012ca09f9c470719d9734b7ef5e637af0851cc2fc; odin_tt=081974ccd8c05aa99c9def51a2ae7bff2f2eff4b7187eaefc9035f6f29b3cf23b9e7e874eddd540c813b1aeacd524755eeea3b487b1604ab69034829839420a35eb841c2c1660309fb2e8733f8cec76e; store-country-sign=MEIEDEmOSATz1HVkE69LLQQgQ-xv4GcuilURIjdWweEXq86fX22G46h6wFIvL1YxVVUEEOiSOHWEhgwxBRqtF3ZvKh4; s_v_web_id=verify_mi5rex4u_QQ3WUuF2_qkrc_4zFV_9lPB_6ZiBsQO9Yg1Y; msToken=HGxJ50IHbcJxidsAy4biksW4jCfUpjG5IOfoNZd9m24WyL0muFz7f02qUT2A4HCKPQPheRtCr66460XMCJQ9mCplXR1zk1fKK81mU65TLKdczaVqauDay_1qAol348Cg_iQaiK74qPZ0EoJBKu_iZbCATA==; msToken=jWvit9euZoPFKgYZamkCxcNRJFbxE2efJZTwjJRsUOFUCqfJXFXMDMp6zFbk7VT7czHFPTMr0hdTmSdJybWT0mwnHYYrra7EQWKq9cwTr0NAdoyzkp-xrBuSlxD6a4Fcai6a72-qqTi1FxkAWjmrliiqwg==' \
|
||||
-H 'priority: u=1, i' \
|
||||
-H 'referer: https://www.tiktok.com/@user73001399001191/video/7561810082636287246' \
|
||||
-H 'sec-ch-ua: "Chromium";v="142", "Google Chrome";v="142", "Not_A Brand";v="99"' \
|
||||
-H 'sec-ch-ua-mobile: ?0' \
|
||||
-H 'sec-ch-ua-platform: "Windows"' \
|
||||
-H 'sec-fetch-dest: empty' \
|
||||
-H 'sec-fetch-mode: cors' \
|
||||
-H 'sec-fetch-site: same-origin' \
|
||||
-H 'user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36'
|
||||
|
||||
|
||||
curl 'https://www.tiktok.com/api/search/general/full/?WebIdLastTime=1762843898&aid=1988&app_language=en&app_name=tiktok_web&browser_language=zh-CN&browser_name=Mozilla&browser_online=true&browser_platform=Win32&browser_version=5.0%20%28Windows%20NT%2010.0%3B%20Win64%3B%20x64%29%20AppleWebKit%2F537.36%20%28KHTML%2C%20like%20Gecko%29%20Chrome%2F142.0.0.0%20Safari%2F537.36&channel=tiktok_web&client_ab_versions=70508271%2C72437276%2C73547759%2C73720540%2C74444736%2C74446915%2C74465410%2C74627577%2C74679798%2C74703728%2C74744616%2C74746519%2C74757744%2C74780477%2C74782564%2C74793838%2C74798355%2C74803471%2C74808328%2C74824020%2C74843467%2C74852654%2C74860161%2C74879745%2C74879783%2C74882809%2C74891662%2C74902367%2C74926160%2C74928117%2C74935708%2C74936938%2C74970253%2C74972148%2C74973673%2C74976255%2C74980175%2C74983940%2C74994853%2C75001423%2C75005876%2C70405643%2C70772958%2C71057832%2C71200802%2C71381811%2C71516509%2C71803300%2C71962127%2C72360691%2C72361743%2C72408100%2C72854054%2C72892778%2C73171280%2C73208420%2C73989921%2C74276218%2C74611443%2C74844724&cookie_enabled=true&count=16&data_collection_enabled=true&device_id=7571356851431851534&device_platform=web_pc&device_type=web_h265&focus_state=true&from_page=search&history_len=5&is_fullscreen=false&is_page_visible=true&keyword=%E6%B6%82%E9%B8%A6%E7%BB%98%E7%94%BB&odinId=7571357724144124941&offset=0&os=windows&priority_region=US&referer=https%3A%2F%2Fwww.google.com%2F®ion=US&root_referer=https%3A%2F%2Fwww.google.com%2F&screen_height=1080&screen_width=1920&search_source=search_history&tz_name=Asia%2FShanghai&user_is_login=true&verifyFp=verify_mi5rex4u_QQ3WUuF2_qkrc_4zFV_9lPB_6ZiBsQO9Yg1Y&web_search_code=%7B%22tiktok%22%3A%7B%22client_params_x%22%3A%7B%22search_engine%22%3A%7B%22ies_mt_user_live_video_card_use_libra%22%3A1%2C%22mt_search_general_user_live_card%22%3A1%7D%7D%2C%22search_server%22%3A%7B%7D%7D%7D&webcast_language=en&msToken=Evo6YZn35dd6dAsaNUBm7WHaOCixR84Hwjo6DVlNBGE0L56xiDF_dmDWfyJIJGq8LDEsjNm5G9H3uMP9LlVsCunVwx0lMEnriQWWWuzpN7Xp4j0Fj5wXbgMEqU9KMd5YfkZ1iqFubWhu99nvT06p5qpUeg==&X-Bogus=DFSzsIVu7-iANCCACObDh-ZLJrOb&X-Gnarly=M5J8rVZ10jjW3H5JvTrrLC6MGn7Qq4X0NFfuLZ1UYP2F5Tyem4CUCigEriTnj4Ui3kZdlhNogxKstfzoLHWeSKWiubEdsYZpiegkx-Ot2OUSwbyC9mcwB8T80j7nJpzf6tMOisjjbGiGzYQDJuNrqgxehrCDKUfdA6CbLeoguoGy7XQjTDxmg3/VsSqdhziaenBlm72xVj0GyLUEgrboEwzp11Xphma3Qo8b-/uiMZWQDNyJaC7rcb11dW-ffpSTMrXvf6EU6QXJav2NYvS2gNjMJBhMf15s0-NNQQIC-USLgeAWEo5Wj-gXgn/YmaUd7hZ=' \
|
||||
-H 'accept: */*' \
|
||||
-H 'accept-language: zh-CN,zh;q=0.9' \
|
||||
-b 'passport_csrf_token=fc48fa188f4d67baad733476f64baccf; passport_csrf_token_default=fc48fa188f4d67baad733476f64baccf; last_login_method=email; delay_guest_mode_vid=5; living_user_id=480687482446; tiktok_webapp_theme_source=auto; tt_chain_token=tTxp2ztaIaxTaXsGrJb+8Q==; tiktok_webapp_lang=zh-Hans; d_ticket=64bffcf490d4c9c7839c94bfa06f09cf36a43; ttwid=1%7C28VHivouaKIrGON9d-ZudJgYZUKezdC-9xuqts9saLQ%7C1763362651%7C2cd9c33c1a5733387a276a04d1af0af2026a80003217b55a841cb13520f1347f; myCookie=rap; fblo_1862952583919182=y; tiktok_webapp_theme=light; multi_sids=7571357724144124941%3Ab0685b23eb2eeb5f5e0a5604801f365b; cmpl_token=AgQQAPNSF-RO0rksMxu3N50083NFwqLP_4zZYKPv8w; sid_guard=b0685b23eb2eeb5f5e0a5604801f365b%7C1763435980%7C15552000%7CSun%2C+17-May-2026+03%3A19%3A40+GMT; uid_tt=b8d8d5d8b062bd2f3badb4e51420dc711f3305fec2a664f03306a64342912e7e; uid_tt_ss=b8d8d5d8b062bd2f3badb4e51420dc711f3305fec2a664f03306a64342912e7e; sid_tt=b0685b23eb2eeb5f5e0a5604801f365b; sessionid=b0685b23eb2eeb5f5e0a5604801f365b; sessionid_ss=b0685b23eb2eeb5f5e0a5604801f365b; tt_session_tlb_tag=sttt%7C1%7CsGhbI-su619eClYEgB82W__________pPerzW7vBXrU5jjfVhTeyC35OugALhjpAGeyHj7inp30%3D; sid_ucp_v1=1.0.1-KDVjMGU0Y2Q1OTg2ZjI0ODI2MjI2NWFmZDY0YWYyMmE0ZTkzY2M5M2QKIgiNiJX4w7e3iWkQzMvvyAYYswsgDDDgu8vIBjgBQOoHSAQQBBoHdXNlYXN0NSIgYjA2ODViMjNlYjJlZWI1ZjVlMGE1NjA0ODAxZjM2NWIyTgogV1B2-bFMjyKUv3NQtyvtMGmk7ui-Sl9aVTQStE9c4gkSIMJacGh0QuV1QyQEcXmFWpSTFvdWSbfDoe3vFgrRtb2YGAEiBnRpa3Rvaw; ssid_ucp_v1=1.0.1-KDVjMGU0Y2Q1OTg2ZjI0ODI2MjI2NWFmZDY0YWYyMmE0ZTkzY2M5M2QKIgiNiJX4w7e3iWkQzMvvyAYYswsgDDDgu8vIBjgBQOoHSAQQBBoHdXNlYXN0NSIgYjA2ODViMjNlYjJlZWI1ZjVlMGE1NjA0ODAxZjM2NWIyTgogV1B2-bFMjyKUv3NQtyvtMGmk7ui-Sl9aVTQStE9c4gkSIMJacGh0QuV1QyQEcXmFWpSTFvdWSbfDoe3vFgrRtb2YGAEiBnRpa3Rvaw; store-idc=useast5; store-country-code=us; store-country-code-src=uid; tt-target-idc=useast8; tt-target-idc-sign=jh5EgVgJzs2Zhfd2qcTVn1799rd4vK_dkjNT5E1hkY8ey6Iuuuo2qPfonmOJN-73-SEUPvtKH8L04xVHFdfuxGD4jEaT9iJGdE634_1n9RDR1aV8X7xR36LWBtgYwCfK96M28ozQElXgVDFsHS5jNIH0Jfq5gaisBGFWCAz7zEHd3YwWFjSVW96udWsQjGHM_y0UYLKmGEwmNh3nCmKOGntgfvFHrzuxfYL2T6upJ8x8WMb3GG-tGXKw4N05kaWH3LJCY3hGiCOIBX3s8_n0jvn1PLu9yiOUiF2f-K63HqcnVxPsuYfg8iYE47R08TALOuvQE9CFArejv2TMFIjiGKi-MB3BlWwwCUmUSbTyoTLBzSuKT5Elynh0l1JVMxrXlqZn39OMcs8_AB0n_RyyAF3FH9pV0sQ2lza6iJtZim0hmnAuTn8C26lyuss0BP3vlSPtp26rNMyqs1uZlpIclUzI7hmlBV6HNph2l1oBp7QMbvCQLFIGYMJXTL5kP_kX; tt_csrf_token=2alNSmcj-CbgYZfsy7TgGoLanfvaF_S5Ya4M; passport_fe_beating_status=true; ttwid=1%7C28VHivouaKIrGON9d-ZudJgYZUKezdC-9xuqts9saLQ%7C1763542066%7C310b419382cc569f618a86d012ca09f9c470719d9734b7ef5e637af0851cc2fc; odin_tt=081974ccd8c05aa99c9def51a2ae7bff2f2eff4b7187eaefc9035f6f29b3cf23b9e7e874eddd540c813b1aeacd524755eeea3b487b1604ab69034829839420a35eb841c2c1660309fb2e8733f8cec76e; store-country-sign=MEIEDEmOSATz1HVkE69LLQQgQ-xv4GcuilURIjdWweEXq86fX22G46h6wFIvL1YxVVUEEOiSOHWEhgwxBRqtF3ZvKh4; s_v_web_id=verify_mi5rex4u_QQ3WUuF2_qkrc_4zFV_9lPB_6ZiBsQO9Yg1Y; msToken=imJTbrm-SxNJWwT4U4KOTLwsN5UnjQQzm-bHzXVVdiKnymtLbyQXbM_dziPgdrG3rcvbHJf_WWoBIuJ8AegguxYLz_gA0dQNa8suc7aNA3RA_Z7FqCUJd7iO4TJX5lU3dG0Ahchd28z0ip0HADyOygq2sQ==; msToken=Na9bB8_PXmSHmEprxHatJ3iAi6DYMS4DGaKymzxCSv5ho-vkxkGi2Oh4LHpL3LntQloywKO0p5gTBoGjA3BKW7uguLGHfS6FiTPzo5JkbwAMPkYyGdaoQh3yikWSJFGPNMpdrwaN8-ta5IcfWZlfv_QQcQ==' \
|
||||
-H 'priority: u=1, i' \
|
||||
-H 'referer: https://www.tiktok.com/search?q=%E6%B6%82%E9%B8%A6%E7%BB%98%E7%94%BB&t=1763542137936' \
|
||||
-H 'sec-ch-ua: "Chromium";v="142", "Google Chrome";v="142", "Not_A Brand";v="99"' \
|
||||
-H 'sec-ch-ua-mobile: ?0' \
|
||||
-H 'sec-ch-ua-platform: "Windows"' \
|
||||
-H 'sec-fetch-dest: empty' \
|
||||
-H 'sec-fetch-mode: cors' \
|
||||
-H 'sec-fetch-site: same-origin' \
|
||||
-H 'user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36'
|
||||
BIN
data/__pycache__/store.cpython-312.pyc
Normal file
BIN
data/__pycache__/store.cpython-312.pyc
Normal file
Binary file not shown.
88957
data/comments.csv
Normal file
88957
data/comments.csv
Normal file
File diff suppressed because one or more lines are too long
5548
data/key_comment.csv
Normal file
5548
data/key_comment.csv
Normal file
File diff suppressed because it is too large
Load Diff
596
data/keyword.txt
Normal file
596
data/keyword.txt
Normal file
@@ -0,0 +1,596 @@
|
||||
direct liquid soft head acrylic marker pen
|
||||
guangna direct liquid soft head acrylic marker pen
|
||||
guangna direct liquid soft head acrylic marker pen 24 colors
|
||||
@huangaa_3 guangna direct liquid soft head acrylic marker pen
|
||||
how to draw with water based markers
|
||||
acrylic marker holder
|
||||
acrylic pen markers
|
||||
liquid acrylic pens
|
||||
direct liquid pen
|
||||
direct liquid acrylic marker pen japan
|
||||
direct liquid acrylic marker pen japanese
|
||||
direct liquid acrylic marker pen jumbo
|
||||
direct liquid acrylic marker pen
|
||||
acrylic marker pens
|
||||
liquitex acrylic markers review
|
||||
acrylic marker art
|
||||
liquitex acrylic marker
|
||||
direct liquid acrylic marker pen instructions
|
||||
direct liquid acrylic marker pen ink
|
||||
direct liquid acrylic marker pen ii
|
||||
direct liquid acrylic marker pen instructions pdf
|
||||
direct liquid acrylic marker pen in usa
|
||||
direct liquid acrylic marker pen msds
|
||||
direct liquid acrylic marker pen markers
|
||||
direct liquid acrylic marker pen michaels
|
||||
direct liquid acrylic marker pen msds sheet
|
||||
direct liquid acrylic marker pen large
|
||||
direct liquid acrylic marker pen lowes
|
||||
direct liquid acrylic marker pen label
|
||||
direct liquid acrylic marker pen liquid
|
||||
direct liquid acrylic marker pen light
|
||||
direct liquid acrylic marker pen liner
|
||||
direct liquid acrylic marker pen liner review
|
||||
what pen to use to write on acrylic
|
||||
liquid marker pen
|
||||
direct liquid acrylic marker pen kit
|
||||
direct liquid acrylic marker pen kit instructions
|
||||
direct liquid acrylic marker pen orange
|
||||
direct liquid acrylic marker pen only
|
||||
direct liquid acrylic marker pen on amazon
|
||||
direct liquid acrylic marker pen oil
|
||||
direct liquid acrylic marker pen on sale
|
||||
direct liquid acrylic marker pen off clothes
|
||||
direct liquid acrylic marker pen off golf balls
|
||||
direct liquid acrylic marker pen on walls
|
||||
direct liquid acrylic marker pen pack
|
||||
direct liquid acrylic marker pen pink
|
||||
direct liquid acrylic marker pen pen
|
||||
direct liquid acrylic marker pen purple
|
||||
direct liquid acrylic marker pen price
|
||||
direct liquid acrylic marker pen paper mate
|
||||
direct liquid acrylic marker pen quality
|
||||
direct liquid acrylic marker pen quick dry
|
||||
direct liquid acrylic marker pen qvc
|
||||
direct liquid acrylic marker pen quizlet
|
||||
direct liquid acrylic marker pen quiz
|
||||
direct liquid acrylic marker pen refill
|
||||
direct liquid acrylic marker pen review
|
||||
direct liquid acrylic marker pen reddit
|
||||
direct liquid acrylic marker pen red
|
||||
direct liquid acrylic marker pen target
|
||||
direct liquid acrylic marker pen tip
|
||||
direct liquid acrylic marker pen type
|
||||
direct liquid acrylic marker pen tint
|
||||
direct liquid acrylic marker pen use
|
||||
direct liquid acrylic marker pen usa
|
||||
direct liquid acrylic marker pen us
|
||||
direct liquid acrylic marker pen walmart
|
||||
direct liquid acrylic marker pen white
|
||||
direct liquid acrylic marker pen wholesale
|
||||
direct liquid acrylic marker pen waterproof
|
||||
direct liquid acrylic marker pen walgreens
|
||||
direct liquid acrylic marker pen video
|
||||
direct liquid acrylic marker pen vintage
|
||||
direct liquid acrylic marker pen vs
|
||||
direct liquid acrylic marker pen volume
|
||||
direct liquid acrylic marker pen vs regular
|
||||
direct liquid acrylic marker pen vape
|
||||
direct liquid acrylic marker pen xl
|
||||
direct liquid acrylic marker pen x2
|
||||
acrylic marker diy
|
||||
diy acrylic paint markers
|
||||
direct liquid acrylic marker pen zoom
|
||||
direct liquid acrylic marker pen zero
|
||||
direct liquid acrylic marker pen zipper
|
||||
direct liquid acrylic marker pen zip
|
||||
direct liquid acrylic marker pen zoominfo
|
||||
acrylic marker paper
|
||||
acrylic marker painting
|
||||
acrylic liquid pens
|
||||
acrylic markers blick
|
||||
what markers can you use on acrylic
|
||||
are liquid chalk markers permanent
|
||||
acrylic marker edding
|
||||
direct liquid acrylic marker pen amazon
|
||||
direct liquid acrylic marker pen amazon prime
|
||||
direct liquid acrylic marker pen acrylic
|
||||
direct liquid acrylic marker pen app
|
||||
direct liquid acrylic marker pen art
|
||||
acrylic marker fine tip
|
||||
acrylic paint markers fine tip
|
||||
are sharpie gel pens waterproof
|
||||
liqui-mark gel pens
|
||||
acrylic marker graffiti
|
||||
acrylic paint markers how to use
|
||||
acrylic ink marker
|
||||
krink acrylic markers
|
||||
permanent marker on acrylic plastic
|
||||
liqui-mark permanent markers
|
||||
liquitex acrylic paint markers
|
||||
marker acrylic
|
||||
acrylic paint marker waterproof
|
||||
acrylic pen marker
|
||||
direct liquid acrylic marker pen black
|
||||
direct liquid acrylic marker pen bulk
|
||||
direct liquid acrylic marker pen blue
|
||||
direct liquid acrylic marker pen brand
|
||||
direct liquid acrylic marker pen brand name
|
||||
direct liquid acrylic marker pen bleeding through paper
|
||||
direct liquid acrylic marker pen bleeding through paint
|
||||
direct liquid acrylic marker pen brush photoshop
|
||||
oil based marker on acrylic paint
|
||||
acrylic markers on plastic
|
||||
acrylic paint marker refill
|
||||
are sharpies water based
|
||||
are sharpies oil or water based
|
||||
acrylic marker liquitex
|
||||
liquitex acrylic pen
|
||||
direct liquid acrylic marker pen directions
|
||||
direct liquid acrylic marker pen dollar tree
|
||||
direct liquid acrylic marker pen dispenser
|
||||
direct liquid acrylic marker pen directions for use
|
||||
direct liquid acrylic marker pen dry
|
||||
direct liquid acrylic marker pen disguises
|
||||
direct liquid acrylic marker pen depot
|
||||
direct liquid acrylic marker pens fine line
|
||||
direct liquid acrylic marker pens fine point
|
||||
direct liquid acrylic marker pens fine tip
|
||||
direct liquid acrylic marker pens for painting
|
||||
direct liquid acrylic marker pen ebay
|
||||
direct liquid acrylic marker pen ewg
|
||||
direct liquid acrylic marker pen elite
|
||||
direct liquid acrylic marker pen elite review
|
||||
direct liquid acrylic marker pen elite vaporizer
|
||||
uni acrylic markers
|
||||
using acrylic markers
|
||||
what marker to use on acrylic
|
||||
what type of marker to use on acrylic
|
||||
what marker works on plastic
|
||||
where to buy acrylic markers
|
||||
where to buy acrylic paint pens
|
||||
are acrylic markers permanent
|
||||
acrylic marker uses
|
||||
acrylic marker tutorial
|
||||
can you use permanent marker on acrylic paint
|
||||
who acrylic markers
|
||||
who sells liquitex acrylic paint
|
||||
are acrylic markers water based
|
||||
is acrylic liquid monomer
|
||||
are acrylic pens oil based
|
||||
are acrylic paint markers waterproof
|
||||
can you use permanent marker on acrylic
|
||||
can dry erase markers be used on acrylic
|
||||
can acrylic markers be used on fabric
|
||||
acrylic vs oil marker
|
||||
acrylic markers vs acrylic paint
|
||||
acrylic paint marker vs oil paint marker
|
||||
are acrylic paint markers permanent
|
||||
will dry erase markers work on plexiglass
|
||||
will permanent marker stick to plastic
|
||||
will permanent marker stay on silicone
|
||||
does dry erase markers work on plexiglass
|
||||
worst acrylic paint
|
||||
worst acrylic paint brands
|
||||
worst acrylic powder
|
||||
art supplies acrylic marker testing comparison
|
||||
acrylic paint markers permanent
|
||||
do dry erase markers work on acrylic
|
||||
do acrylic markers work on fabric
|
||||
do acrylic paint pens work on plastic
|
||||
do dry erase markers work on plexiglass
|
||||
best acrylic markers
|
||||
best acrylic markers for artists
|
||||
best acrylic paint pens for plastic
|
||||
top rated acrylic paint markers
|
||||
top rated acrylic paint pens
|
||||
can acrylic paint markers be used on canvas
|
||||
acrylic marker pens
|
||||
acrylic marker pen set
|
||||
acrylic marker pen price
|
||||
acrylic marker pen uses
|
||||
acrylic marker pen drawing
|
||||
acrylic marker pens uk
|
||||
acrylic marker pen black
|
||||
acrylic marker pen 24 shades
|
||||
acrylic marker pen art
|
||||
acrylic marker pen painting
|
||||
acrylic marker pens hobbycraft
|
||||
acrylic marker pen amazon
|
||||
acrylic marker pen mr diy
|
||||
acrylic marker pen for fabric
|
||||
acrylic marker pen how to use
|
||||
acrylic pen and marker holder
|
||||
acrylic marker pen himic
|
||||
acrylic marker pens home bargains
|
||||
acrylic marker pens hobby lobby
|
||||
acrylic paint pen hobby lobby
|
||||
acrylic paint pen holder
|
||||
acrylic paint pen hobbycraft
|
||||
acrylic paint pen home depot
|
||||
acrylic paint pen how to use
|
||||
acrylic paint pens home bargains
|
||||
acrylic paint pens hs code
|
||||
best acrylic paint pens hobbycraft
|
||||
acrylic marker pen gold
|
||||
acrylic marker pens guangna
|
||||
acrylic paint pen gold
|
||||
acrylic paint pen green
|
||||
acrylic paint pen grey
|
||||
acrylic paint pen glass
|
||||
acrylic paint pen grey set
|
||||
acrylic paint pen golf ball
|
||||
acrylic paint pen graffiti
|
||||
acrylic paint pens grabie
|
||||
acrylic paint pens glitter
|
||||
acrylic paint pens guangna
|
||||
acrylic paint pens gundam
|
||||
acrylic paint pens gunpla
|
||||
acrylic paint pen ideas
|
||||
acrylic paint pen ideas for beginners
|
||||
acrylic paint pen icon
|
||||
acrylic paint pens ireland
|
||||
acrylic paint pens in national bookstore
|
||||
acrylic paint pens in store
|
||||
acrylic paint pens india
|
||||
acrylic paint in pen
|
||||
acrylic paint pen art ideas
|
||||
acrylic paint pen drawing ideas
|
||||
easy acrylic paint pen ideas
|
||||
which acrylic paint pen is best
|
||||
acrylic paint pen craft ideas
|
||||
acrylic paint pens ak interactive
|
||||
acrylic marker pen flair
|
||||
acrylic marker pen faber castell
|
||||
acrylic marker pen fine tip
|
||||
acrylic marker pen for kids
|
||||
acrylic marker pen flipkart
|
||||
acrylic marker pens for glass
|
||||
acrylic paint pen fine tip
|
||||
acrylic paint pen for fabric
|
||||
acrylic paint pen for wood
|
||||
acrylic paint pen flowers
|
||||
acrylic paint pen for art and crafts
|
||||
acrylic paint pen for canvas
|
||||
acrylic paint pen for glass
|
||||
acrylic marker pen shopee
|
||||
acrylic paint pen set
|
||||
acrylic paint pen storage
|
||||
acrylic marker sketch pen
|
||||
acrylic paint pen set uk
|
||||
acrylic paint pen silver
|
||||
acrylic paint pen sealer
|
||||
acrylic paint pen spotlight
|
||||
acrylic paint pen set nearby
|
||||
acrylic paint sketch pen
|
||||
acrylic paint marker pen set
|
||||
acrylic marker 12 pen set flair brand
|
||||
acrylic marker pen posca
|
||||
acrylic marker pen popular
|
||||
acrylic marker pen peak
|
||||
acrylic paint pen projects
|
||||
acrylic paint pen posca
|
||||
acrylic paint pen price
|
||||
acrylic paint pen painting
|
||||
acrylic paint pen pink
|
||||
acrylic paint pens pna
|
||||
acrylic paint pens permanent
|
||||
acrylic paint pens pastel
|
||||
acrylic paint pens professional
|
||||
acrylic paint pen joann
|
||||
acrylic paint pens jumbo
|
||||
acrylic pen ideas
|
||||
what pens write on acrylic
|
||||
do acrylic paint pens work on plastic
|
||||
what are acrylic paint pens used for
|
||||
acrylic marker pen ohuhu
|
||||
acrylic paint pen on fabric
|
||||
acrylic paint pen on glass
|
||||
acrylic paint pen officeworks
|
||||
acrylic paint pen on canvas
|
||||
acrylic paint pen on wood
|
||||
acrylic paint pen on plastic
|
||||
acrylic paint pen on mirror
|
||||
acrylic paint pen on metal
|
||||
acrylic paint pen on leather
|
||||
acrylic paint pen organizer
|
||||
acrylic paint pen on skin
|
||||
acrylic paint pen on shirt
|
||||
acrylic paint pen on ceramic
|
||||
acrylic marker pen
|
||||
acrylic marker pen white
|
||||
acrylic marker pen near me
|
||||
acrylic paint pens video
|
||||
acrylic paint pens vs markers
|
||||
acrylic paint pen vs sharpie
|
||||
sharpie acrylic paint pens vs posca
|
||||
acrylic paint pens vs alcohol markers
|
||||
acrylic pen vs marker
|
||||
acrylic pen vs permanent marker
|
||||
acrylic marker vs brush pen
|
||||
acrylic marker vs color pen
|
||||
acrylic marker vs gel pen
|
||||
acrylic paint pens velles
|
||||
acrylic marker pen under ₹ 100
|
||||
acrylic marker pen under 200
|
||||
acrylic marker pen under ₹ 200
|
||||
acrylic marker pen under ₹ 300
|
||||
acrylic marker pen under 100
|
||||
acrylic marker pen under ₹ 400
|
||||
acrylic paint pen uses
|
||||
acrylic paint pen ultra fine
|
||||
acrylic paint pen uk
|
||||
acrylic paint pens ultra fine tip
|
||||
acrylic paint pens uk amazon
|
||||
best acrylic marker pens uk
|
||||
acrylic marker pens kmart
|
||||
acrylic paint pen kmart
|
||||
acrylic paint pen kits
|
||||
acrylic paint pens kids
|
||||
acrylic paint pens kuwait
|
||||
acrylic paint pens nz kmart
|
||||
acrylic paint dot pens kmart
|
||||
kokuyo camlin acrylic marker pen
|
||||
what are acrylic markers used for
|
||||
does permanent marker stay on acrylic
|
||||
acrylic paint pens not working
|
||||
acrylic pens how to use
|
||||
acrylic marker pen meesho
|
||||
acrylic marker pen malaysia
|
||||
acrylic marker pens michaels
|
||||
acrylic paint pen michaels
|
||||
acrylic paint pen mont marte
|
||||
acrylic paint pen metallic
|
||||
acrylic paint pen molotow
|
||||
acrylic paint pen mug
|
||||
acrylic paint pens miniatures
|
||||
acrylic paint pens mr price
|
||||
acrylic paint pens medium tip
|
||||
acrylic paint pens mitre 10
|
||||
acrylic paint pens michaels nearby
|
||||
acrylic marker pen quality
|
||||
acrylic marker pen quiz
|
||||
acrylic marker pen quick dry
|
||||
acrylic marker pen qvc
|
||||
acrylic marker pen quick release
|
||||
artecho acrylic marker pen
|
||||
artecho dual tip acrylic marker pen
|
||||
arrtx acrylic marker pen
|
||||
acrylic marker vs acrylic paint pen
|
||||
acrylic paint marker pen amazon
|
||||
difference between acrylic marker and brush pen
|
||||
akarued white paint pen acrylic marker
|
||||
acrylic marker brush pen amazon
|
||||
marker acrylic pen allegro
|
||||
acrylic marker and brush pen
|
||||
is an acrylic marker a paint pen
|
||||
best acrylic marker pen
|
||||
black acrylic marker pen
|
||||
brustro acrylic marker pen
|
||||
best white acrylic marker pen
|
||||
baoke acrylic marker pen
|
||||
acrylic marker brush pen
|
||||
brush pen vs acrylic marker
|
||||
acrylic brush marker pen set
|
||||
acrylic paint marker brush pen
|
||||
acrylic paint marker pen black
|
||||
acrylic marker pen box
|
||||
acrylic paint marker calligraphy brush pen
|
||||
best acrylic paint marker pen
|
||||
acrylic marker pens the works
|
||||
acrylic marker pens the range
|
||||
acrylic marker pens tesco
|
||||
acrylic paint pen tooli art
|
||||
acrylic paint pen tutorial
|
||||
acrylic paint pen target
|
||||
acrylic paint pen techniques
|
||||
acrylic paint pen tips
|
||||
acrylic paint pen thin
|
||||
acrylic paint pen thick
|
||||
acrylic paint pen the range
|
||||
acrylic paint pen tool art
|
||||
acrylic paint pen temu
|
||||
acrylic paint pens the works
|
||||
acrylic paint pen white
|
||||
acrylic paint pen walmart
|
||||
acrylic paint pen water based
|
||||
acrylic paint pen waterproof
|
||||
acrylic paint pen warhammer
|
||||
acrylic paint pen white michaels
|
||||
acrylic marker with pen
|
||||
acrylic paint pen washable
|
||||
acrylic paint pen wood
|
||||
acrylic paint pens warehouse
|
||||
acrylic paint pens with brush tip
|
||||
whsmith acrylic paint pens
|
||||
acrylic paint pens wholesale
|
||||
acrylic marker pen refill
|
||||
acrylic marker pen review
|
||||
acrylic paint pen refillable
|
||||
acrylic paint pen reviews
|
||||
acrylic paint pen removal
|
||||
acrylic paint pen red
|
||||
acrylic paint pen reddit
|
||||
acrylic paint pens range
|
||||
acrylic paint pens rocks
|
||||
acrylic paint pens rymans
|
||||
acrylic paint pens reject shop
|
||||
acrylic paint pens red dot
|
||||
acrylic paint pen for resin
|
||||
acrylic marker pen xl
|
||||
acrylic marker pen xray
|
||||
acrylic marker pen xtool
|
||||
guangna direct liquid soft head acrylic marker pen
|
||||
gold acrylic marker pen
|
||||
grasp acrylic marker pen
|
||||
guangna acrylic marker pen
|
||||
grabie acrylic marker pen
|
||||
languo acrylic marker gel pen
|
||||
acrylic marker brush pen guangna
|
||||
marker acrylic pen m&g
|
||||
goffi acrylic paint marker pen
|
||||
doloha acrylic.marker pen
|
||||
deli acrylic marker pen
|
||||
doms acrylic marker pen
|
||||
dual tip acrylic marker pen
|
||||
direct liquid soft head acrylic marker pen
|
||||
direct liquid acrylic marker pen
|
||||
dual tip acrylic paint pen marker
|
||||
double.sided acrylic pen marker
|
||||
double sided acrylic pen marker set of 24
|
||||
acrylic marker pen diy
|
||||
dual tip acrylic paint pen marker - 24/48/72 colours
|
||||
beyond draw-dual-tip-acrylic-paint-pen-marker
|
||||
fine tip acrylic marker pen
|
||||
flair acrylic marker pen
|
||||
marker pen for acrylic painting
|
||||
marker pen for acrylic board
|
||||
acrylic marker pen used for
|
||||
marker pen for acrylic
|
||||
white marker pen for acrylic
|
||||
flair acrylic paint marker pen
|
||||
ohuhu acrylic marker pen for diy
|
||||
camel acrylic marker pen
|
||||
carissa acrylic marker pen
|
||||
camlin acrylic marker pen
|
||||
acrylic colour marker pen
|
||||
acrylic marker pen china
|
||||
acrylic marker pen in chinese
|
||||
acrylic marker brush pen 80 cores
|
||||
sharpie creative marker acrylic paint pen
|
||||
acrylic marker brush pen 60 crore
|
||||
caneta acrylic marker brush pen
|
||||
miya acrylic marker pen
|
||||
metallic acrylic marker pen
|
||||
led acrylic writing message board night lamp with marker pen
|
||||
acrylic marker brush pen mercado livre
|
||||
b&m acrylic marker pens
|
||||
sketching pens & markers acrylic marker pen
|
||||
marcadores acrylic marker pen
|
||||
set acrylic marker pen
|
||||
soft head acrylic marker pen
|
||||
silver acrylic marker pen
|
||||
acrylic marker 12 pen set
|
||||
acrylic marker pen 48 shades
|
||||
acrylic marker pen 36 shades
|
||||
acrylic marker pen 12 shades
|
||||
pen peak acrylic marker pen
|
||||
price acrylic marker pen
|
||||
posca acrylic marker pen
|
||||
paint acrylic marker pen
|
||||
acrylic permanent marker pen
|
||||
wotek acrylic paint marker pen
|
||||
acrylic paint marker pen white
|
||||
acrylic marker vs paint pen
|
||||
thick acrylic marker pen
|
||||
the acrylic paint marker pen set
|
||||
acrylic tip marker pen
|
||||
is an acrylic marker the same as a paint pen
|
||||
how to use acrylic marker pen
|
||||
enmy acrylic marker pen
|
||||
emmy acrylic marker pen
|
||||
marker acrylic pen empik
|
||||
what are acrylic markers
|
||||
are there acrylic paint pens
|
||||
how to use acrylic markers
|
||||
nicety acrylic marker pen
|
||||
acrylic marker pen nearby
|
||||
acrylic pen marker national bookstore
|
||||
what markers can you use on acrylic
|
||||
ohuhu acrylic marker pen
|
||||
ohuhu acrylic marker pen price
|
||||
unicorn acrylic marker pen
|
||||
what is acrylic marker pen
|
||||
what are acrylic paint markers used for
|
||||
white acrylic marker pen
|
||||
@huangaa_3 guangna direct liquid soft head acrylic marker pen
|
||||
hightune acrylic marker brush pen
|
||||
liquid soft head acrylic marker pen
|
||||
how to use acrylic marker
|
||||
how do acrylic markers work
|
||||
languo acrylic marker pen
|
||||
liquid acrylic marker pen
|
||||
best acrylic marker pens
|
||||
best acrylic paint marker pens
|
||||
what are the best acrylic markers
|
||||
what is the best acrylic paint pens
|
||||
what are the best acrylic pens
|
||||
acrylic marker pen ideas
|
||||
acrylic marker pen price in bangladesh
|
||||
is sharpie acrylic
|
||||
acrylic paint pen zeyar
|
||||
acrylic paint pens new zealand
|
||||
acrylic marker brush pen zjw
|
||||
acrylic marker pen zuixua
|
||||
restly acrylic marker pen
|
||||
how long do acrylic paint pens last
|
||||
are acrylic paint pens permanent
|
||||
are sharpies acrylic
|
||||
acrylic paint pens lyuvie
|
||||
acrylic paint pens like posca
|
||||
acrylic paint pens liquitex
|
||||
acrylic paint pens life of colour
|
||||
acrylic paint pens lowes
|
||||
acrylic paint pens large
|
||||
acrylic paint pens large set
|
||||
acrylic paint pens languo
|
||||
acrylic paint pens for leather shoes
|
||||
acrylic marker pen best
|
||||
acrylic marker pen blinkit
|
||||
acrylic marker pens b&m
|
||||
acrylic paint pen black
|
||||
acrylic paint pen brush tip
|
||||
acrylic paint pen brands
|
||||
acrylic paint pen brush
|
||||
acrylic paint pen bunnings
|
||||
acrylic paint pen brown
|
||||
acrylic paint pen big w
|
||||
acrylic paint pen blue
|
||||
acrylic paint pen by numbers
|
||||
acrylic marker pen doms
|
||||
acrylic marker pen deli
|
||||
acrylic marker pen dual tip
|
||||
acrylic paint pen drawing
|
||||
acrylic paint pen drying time
|
||||
acrylic paint pen designs
|
||||
acrylic paint pen doodles
|
||||
acrylic paint pen dry
|
||||
dual tip acrylic paint.pen
|
||||
acrylic paint pens dried out
|
||||
acrylic paint pens desire deluxe
|
||||
acrylic marker pen camlin
|
||||
acrylic marker pen camel
|
||||
acrylic paint pen coloring book
|
||||
acrylic paint pen crafts
|
||||
acrylic paint pen case
|
||||
acrylic paint pen canvas
|
||||
acrylic marker colour pen
|
||||
acrylic paint pen car
|
||||
acrylic paint pen colouring book
|
||||
acrylic paint pen canada
|
||||
acrylic paint pens cheap
|
||||
acrylic paint pens crockd
|
||||
acrylic marker pen enmy
|
||||
acrylic marker pens ebay
|
||||
acrylic paint pen extra fine
|
||||
acrylic paint pen extra fine tip
|
||||
acrylic paint pen empty
|
||||
acrylic paint pens ebay
|
||||
acrylic paint pens earth tones
|
||||
acrylic paint pens eckersley
|
||||
acrylic paint pens enmy
|
||||
bia acrylic paint pen extra fine tip
|
||||
sharpie acrylic paint pens earth tones
|
||||
artistro acrylic paint pens extra fine tip
|
||||
acrylic paint pens double ended
|
||||
acrylic marker pens arrtx
|
||||
acrylic marker pens arrtx simptap
|
||||
acrylic paint pen art
|
||||
acrylic paint pen amazon
|
||||
acrylic paint pen artwork
|
||||
acrylic paint pens australia
|
||||
acrylic paint pens argos
|
||||
acrylic paint pens asda
|
||||
acrylic paint pens artistro
|
||||
acrylic paint pens arrtx
|
||||
acrylic paint pens at michaels
|
||||
18
data/links.json
Normal file
18
data/links.json
Normal file
@@ -0,0 +1,18 @@
|
||||
{
|
||||
"keyword": "马克笔绘画",
|
||||
"count": 12,
|
||||
"links": [
|
||||
"https://www.tiktok.com/@drawing_board8/video/7569235583214587150",
|
||||
"https://www.tiktok.com/@seekingartsupplier_my/video/7259632291306032402",
|
||||
"https://www.tiktok.com/@huangaa_3/video/7522044666745818382",
|
||||
"https://www.tiktok.com/@acrylicmarkerasmr/video/7470837517571345695",
|
||||
"https://www.tiktok.com/@fungraffiti/video/7550422681103895863",
|
||||
"https://www.tiktok.com/@yzd20328cuq/video/7553666411760012574",
|
||||
"https://www.tiktok.com/@acrylicmarkerasmr/video/7495580197811440926",
|
||||
"https://www.tiktok.com/@miss.uk3/video/7561345033010416916",
|
||||
"https://www.tiktok.com/@nashvibes/video/7472932371990416670",
|
||||
"https://www.tiktok.com/@muse1378/video/7322534704643722539",
|
||||
"https://www.tiktok.com/@miss.uk3/video/7567258402661960981",
|
||||
"https://www.tiktok.com/@miss.uk3/video/7569627806678846740"
|
||||
]
|
||||
}
|
||||
25
data/store.py
Normal file
25
data/store.py
Normal file
@@ -0,0 +1,25 @@
|
||||
from utils.io import write_json
|
||||
|
||||
def save_links_snapshot(path, keywords, items, links):
|
||||
"""写入链接快照
|
||||
|
||||
结构:`{'keywords': list, 'items': list, 'total_count': int, 'links': list}`
|
||||
"""
|
||||
write_json(path, {'keywords': list(keywords), 'items': items, 'total_count': len(links), 'links': all_links(links)})
|
||||
return path
|
||||
|
||||
def save_comments_snapshot(path, items):
|
||||
"""写入评论快照:`{'items': items}`"""
|
||||
write_json(path, {'items': items})
|
||||
return path
|
||||
|
||||
def all_links(links):
|
||||
"""将任意可迭代链接转为列表(用于 JSON 序列化)"""
|
||||
return list(links)
|
||||
"""数据快照写入工具
|
||||
|
||||
职责:
|
||||
- 将链接搜索的结果按统一结构写入 JSON
|
||||
- 将评论抓取的结果写入 JSON
|
||||
仅负责序列化,不包含业务逻辑。
|
||||
"""
|
||||
8914177
data/tik_comments.json
Normal file
8914177
data/tik_comments.json
Normal file
File diff suppressed because one or more lines are too long
39884
data/urls.json
Normal file
39884
data/urls.json
Normal file
File diff suppressed because it is too large
Load Diff
BIN
db/__pycache__/mysql_import.cpython-312.pyc
Normal file
BIN
db/__pycache__/mysql_import.cpython-312.pyc
Normal file
Binary file not shown.
49
db/mysql_import.py
Normal file
49
db/mysql_import.py
Normal file
@@ -0,0 +1,49 @@
|
||||
import csv
|
||||
import os
|
||||
|
||||
def import_csv_to_mysql(csv_path, host='localhost', port=3306, user='root', password='', database='crawler_tiktok', table='comments'):
|
||||
try:
|
||||
import pymysql
|
||||
except Exception:
|
||||
print('missing dependency: pip install pymysql', flush=True)
|
||||
raise SystemExit(1)
|
||||
if not os.path.exists(csv_path):
|
||||
print('csv not found: ' + csv_path, flush=True)
|
||||
raise SystemExit(1)
|
||||
conn = pymysql.connect(host=host, port=int(port), user=user, password=password, database=database, charset='utf8mb4')
|
||||
cur = conn.cursor()
|
||||
cur.execute(f"CREATE TABLE IF NOT EXISTS `{table}` (\n `id` BIGINT AUTO_INCREMENT PRIMARY KEY,\n `username` VARCHAR(255),\n `text` TEXT\n ) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci")
|
||||
rows = []
|
||||
with open(csv_path, 'r', encoding='utf-8', newline='') as f:
|
||||
r = csv.reader(f)
|
||||
first = True
|
||||
for row in r:
|
||||
if first and row and row[0].lower() == 'username':
|
||||
first = False
|
||||
continue
|
||||
first = False
|
||||
if not row:
|
||||
continue
|
||||
username = row[0] if len(row) > 0 else ''
|
||||
text = row[1] if len(row) > 1 else ''
|
||||
rows.append((username, text))
|
||||
if rows:
|
||||
cur.executemany(f"INSERT INTO `{table}` (`username`,`text`) VALUES (%s,%s)", rows)
|
||||
conn.commit()
|
||||
cur.close()
|
||||
conn.close()
|
||||
print(f"inserted={len(rows)}", flush=True)
|
||||
|
||||
def create_database_if_not_exists(host='localhost', port=3306, user='root', password='', database='yunque'):
|
||||
try:
|
||||
import pymysql
|
||||
except Exception:
|
||||
print('missing dependency: pip install pymysql', flush=True)
|
||||
raise SystemExit(1)
|
||||
conn = pymysql.connect(host=host, port=int(port), user=user, password=password, charset='utf8mb4')
|
||||
cur = conn.cursor()
|
||||
cur.execute(f"CREATE DATABASE IF NOT EXISTS `{database}` CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci")
|
||||
conn.commit()
|
||||
cur.close()
|
||||
conn.close()
|
||||
print(f"database_ready={database}", flush=True)
|
||||
185
main.py
Normal file
185
main.py
Normal file
@@ -0,0 +1,185 @@
|
||||
import argparse
|
||||
import os
|
||||
from utils.io import load_keywords_from_file, read_json
|
||||
from tiktok.search import save_links_multi
|
||||
from tiktok.comments import save_comments_from_links
|
||||
from db.mysql_import import import_csv_to_mysql, create_database_if_not_exists
|
||||
|
||||
def run_links(args):
|
||||
"""运行链接收集阶段
|
||||
|
||||
参数来源:命令行(关键词、请求文件、分页、并发等)
|
||||
流程:
|
||||
1. 汇总关键词(--keyword/--keywords/--keywords-file)
|
||||
2. 校验非空
|
||||
3. 调用 `save_links_multi` 并发搜索与去重,保存到 `args.out`
|
||||
"""
|
||||
kws = []
|
||||
if args.keyword:
|
||||
kws.extend([k for k in args.keyword if k])
|
||||
if args.keywords:
|
||||
for k in args.keywords.split(','):
|
||||
k = k.strip()
|
||||
if k:
|
||||
kws.append(k)
|
||||
if args.keywords_file:
|
||||
kws.extend(load_keywords_from_file(args.keywords_file))
|
||||
kws = [k for k in kws if k]
|
||||
if not kws:
|
||||
raise SystemExit('no keywords')
|
||||
save_links_multi(kws, out_path=args.out, file_path=args.file_path, max_pages=args.max_pages, timeout=args.timeout, count=args.count, workers=args.workers)
|
||||
|
||||
def run_comments(args):
|
||||
"""运行评论与回复抓取阶段
|
||||
|
||||
输入:`args.links_json`(可为统一快照或简单结构)
|
||||
读取逻辑:优先 `links` 字段;若无,则聚合 `items[*].links`
|
||||
调用:`save_comments_from_links` 执行并发抓取,输出 JSON 与可选 CSV
|
||||
"""
|
||||
obj = read_json(args.links_json)
|
||||
links = obj.get('links') or []
|
||||
if not links and os.path.exists(args.links_json):
|
||||
try:
|
||||
for name in ['links', 'items']:
|
||||
if name == 'items':
|
||||
tmp = []
|
||||
for it in obj.get('items', []):
|
||||
tmp.extend(it.get('links', []))
|
||||
links = tmp
|
||||
break
|
||||
except Exception:
|
||||
pass
|
||||
if not links:
|
||||
raise SystemExit('no links')
|
||||
save_comments_from_links(links, out_path=args.out, file_path=args.file_path, count=args.count, pages=args.pages, timeout=args.timeout, reply_count=args.reply_count, reply_pages=args.reply_pages, total_limit=args.limit, reply_total_limit=args.reply_limit, csv_path=args.csv, workers=args.workers)
|
||||
|
||||
def run_all(args):
|
||||
"""串联执行链接收集与评论抓取
|
||||
|
||||
1. 解析关键词并调用搜索阶段输出到 `args.links_out`
|
||||
2. 读取链接快照,兼容两种结构
|
||||
3. 调用评论抓取阶段输出到 `args.comments_out` 并可写入 CSV
|
||||
适用于一体化流水线执行。
|
||||
"""
|
||||
kws = []
|
||||
if getattr(args, 'keyword', None):
|
||||
kws.extend([k for k in args.keyword if k])
|
||||
if getattr(args, 'keywords', None):
|
||||
for k in args.keywords.split(','):
|
||||
k = k.strip()
|
||||
if k:
|
||||
kws.append(k)
|
||||
if getattr(args, 'keywords_file', None):
|
||||
kws.extend(load_keywords_from_file(args.keywords_file))
|
||||
kws = [k for k in kws if k]
|
||||
if not kws:
|
||||
raise SystemExit('no keywords')
|
||||
save_links_multi(kws, out_path=args.links_out, file_path=args.file_path, max_pages=args.search_max_pages, timeout=args.search_timeout, count=args.search_count, workers=args.search_workers)
|
||||
obj = read_json(args.links_out)
|
||||
links = obj.get('links') or []
|
||||
if not links and os.path.exists(args.links_out):
|
||||
try:
|
||||
for name in ['links', 'items']:
|
||||
if name == 'items':
|
||||
tmp = []
|
||||
for it in obj.get('items', []):
|
||||
tmp.extend(it.get('links', []))
|
||||
links = tmp
|
||||
break
|
||||
except Exception:
|
||||
pass
|
||||
if not links:
|
||||
raise SystemExit('no links')
|
||||
save_comments_from_links(links, out_path=args.comments_out, file_path=args.file_path, count=args.comments_count, pages=args.comments_pages, timeout=args.comments_timeout, reply_count=args.reply_count, reply_pages=args.reply_pages, total_limit=args.comments_limit, reply_total_limit=args.reply_limit, csv_path=args.csv, workers=args.comments_workers)
|
||||
|
||||
def main():
|
||||
"""命令行解析并分发到对应子命令函数"""
|
||||
p = argparse.ArgumentParser()
|
||||
sub = p.add_subparsers(dest='cmd')
|
||||
p_links = sub.add_parser('links')
|
||||
p_links.add_argument('--keyword', action='append')
|
||||
p_links.add_argument('--keywords', default=None)
|
||||
p_links.add_argument('--keywords-file', default=None)
|
||||
p_links.add_argument('--file-path', default=r'data\1.text')
|
||||
p_links.add_argument('--out', default='data\\urls.json')
|
||||
p_links.add_argument('--max-pages', type=int, default=50)
|
||||
p_links.add_argument('--count', type=int, default=None)
|
||||
p_links.add_argument('--timeout', type=int, default=30)
|
||||
p_links.add_argument('--workers', type=int, default=5)
|
||||
p_links.set_defaults(func=run_links)
|
||||
|
||||
p_comments = sub.add_parser('comments')
|
||||
p_comments.add_argument('--links-json', default='data\\urls.json')
|
||||
p_comments.add_argument('--out', default='data\\tik_comments.json')
|
||||
p_comments.add_argument('--file-path', default=r'data\\1.text')
|
||||
p_comments.add_argument('--count', type=int, default=100)
|
||||
p_comments.add_argument('--pages', type=int, default=100)
|
||||
p_comments.add_argument('--timeout', type=int, default=30)
|
||||
p_comments.add_argument('--limit', type=int, default=None)
|
||||
p_comments.add_argument('--reply-count', type=int, default=100)
|
||||
p_comments.add_argument('--reply-pages', type=int, default=100)
|
||||
p_comments.add_argument('--reply-limit', type=int, default=None)
|
||||
p_comments.add_argument('--csv', default='data\\comments.csv')
|
||||
p_comments.add_argument('--workers', type=int, default=None)
|
||||
p_comments.set_defaults(func=run_comments)
|
||||
|
||||
p_all = sub.add_parser('all')
|
||||
p_all.add_argument('--keyword', action='append')
|
||||
p_all.add_argument('--keywords', default=None)
|
||||
p_all.add_argument('--keywords-file', default=None)
|
||||
p_all.add_argument('--file-path', default=r'data\\1.text')
|
||||
p_all.add_argument('--links-out', default='data\\urls.json')
|
||||
p_all.add_argument('--search-max-pages', type=int, default=50)
|
||||
p_all.add_argument('--search-count', type=int, default=None)
|
||||
p_all.add_argument('--search-timeout', type=int, default=30)
|
||||
p_all.add_argument('--search-workers', type=int, default=5)
|
||||
p_all.add_argument('--comments-out', default='data\\tik_comments.json')
|
||||
p_all.add_argument('--comments-count', type=int, default=100)
|
||||
p_all.add_argument('--comments-pages', type=int, default=100)
|
||||
p_all.add_argument('--comments-timeout', type=int, default=30)
|
||||
p_all.add_argument('--comments-limit', type=int, default=None)
|
||||
p_all.add_argument('--reply-count', type=int, default=100)
|
||||
p_all.add_argument('--reply-pages', type=int, default=100)
|
||||
p_all.add_argument('--reply-limit', type=int, default=None)
|
||||
p_all.add_argument('--csv', default='data\\comments.csv')
|
||||
p_all.add_argument('--comments-workers', type=int, default=None)
|
||||
p_all.set_defaults(func=run_all)
|
||||
|
||||
p_mysql = sub.add_parser('mysql')
|
||||
p_mysql.add_argument('--csv', default='data\\comments.csv')
|
||||
p_mysql.add_argument('--host', default='localhost')
|
||||
p_mysql.add_argument('--port', type=int, default=3306)
|
||||
p_mysql.add_argument('--user', default='root')
|
||||
p_mysql.add_argument('--password', default='')
|
||||
p_mysql.add_argument('--database', default='crawler_tiktok')
|
||||
p_mysql.add_argument('--table', default='comments')
|
||||
def run_mysql(args):
|
||||
import_csv_to_mysql(args.csv, host=args.host, port=args.port, user=args.user, password=args.password, database=args.database, table=args.table)
|
||||
p_mysql.set_defaults(func=run_mysql)
|
||||
|
||||
p_mysql_db = sub.add_parser('mysql-db')
|
||||
p_mysql_db.add_argument('--host', default='localhost')
|
||||
p_mysql_db.add_argument('--port', type=int, default=3306)
|
||||
p_mysql_db.add_argument('--user', default='root')
|
||||
p_mysql_db.add_argument('--password', default='')
|
||||
p_mysql_db.add_argument('--database', default='yunque')
|
||||
def run_mysql_db(args):
|
||||
create_database_if_not_exists(host=args.host, port=args.port, user=args.user, password=args.password, database=args.database)
|
||||
p_mysql_db.set_defaults(func=run_mysql_db)
|
||||
|
||||
args = p.parse_args()
|
||||
if not args.cmd:
|
||||
p.print_help()
|
||||
raise SystemExit(1)
|
||||
args.func(args)
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
"""命令行入口模块
|
||||
|
||||
提供三类子命令:
|
||||
- links:根据关键词并发搜索视频链接并保存快照
|
||||
- comments:根据链接列表抓取评论与回复并保存快照与 CSV
|
||||
- all:串联 links 与 comments,一次性完成全流程
|
||||
运行方式建议使用 `python -m crawler_tiktok.main ...` 以避免导入路径问题。
|
||||
"""
|
||||
BIN
tiktok/__pycache__/comments.cpython-312.pyc
Normal file
BIN
tiktok/__pycache__/comments.cpython-312.pyc
Normal file
Binary file not shown.
BIN
tiktok/__pycache__/search.cpython-312.pyc
Normal file
BIN
tiktok/__pycache__/search.cpython-312.pyc
Normal file
Binary file not shown.
208
tiktok/comments.py
Normal file
208
tiktok/comments.py
Normal file
@@ -0,0 +1,208 @@
|
||||
import json
|
||||
import re
|
||||
import threading
|
||||
import time
|
||||
from urllib.parse import urlparse, parse_qs, urlencode
|
||||
from urllib.request import Request, urlopen
|
||||
from core.curl import parse_curl_file
|
||||
from utils.io import ensure_csv_header, append_csv_rows
|
||||
from data.store import save_comments_snapshot
|
||||
|
||||
def _extract_aweme_id(link):
|
||||
"""从视频链接中提取 aweme_id(/video/<id>)"""
|
||||
m = re.search(r"/video/(\d+)", link)
|
||||
return m.group(1) if m else None
|
||||
|
||||
def fetch_comments_aweme(aweme_id, file_path, count=20, max_pages=50, timeout=30, total_limit=None, referer=None):
|
||||
"""分页抓取某个视频的评论
|
||||
|
||||
参数:
|
||||
- `aweme_id` 视频 ID
|
||||
- `file_path` curl 文本文件(第 1 块为评论接口基准)
|
||||
- `count/max_pages/timeout` 分页与超时控制
|
||||
- `total_limit` 总条数上限(可选)
|
||||
- `referer` 用于设置请求头的来源页(可选)
|
||||
行为:失败重试、必要时切换到兜底评论接口;处理 `has_more/next_cursor`。
|
||||
返回:评论对象列表。
|
||||
"""
|
||||
reqs = parse_curl_file(file_path)
|
||||
if not reqs:
|
||||
return []
|
||||
base = reqs[0]
|
||||
headers = dict(base['headers'])
|
||||
if referer:
|
||||
headers['referer'] = referer
|
||||
cursor = 0
|
||||
all_comments = []
|
||||
for _ in range(max_pages):
|
||||
u_parsed = urlparse(base['url'])
|
||||
q = parse_qs(u_parsed.query)
|
||||
q['aweme_id'] = [str(aweme_id)]
|
||||
q['count'] = [str(count)]
|
||||
q['cursor'] = [str(cursor)]
|
||||
u = u_parsed._replace(query=urlencode(q, doseq=True)).geturl()
|
||||
data = None
|
||||
for i in range(3):
|
||||
try:
|
||||
req = Request(u, headers=headers, method='GET')
|
||||
with urlopen(req, timeout=timeout) as resp:
|
||||
data = resp.read()
|
||||
break
|
||||
except Exception:
|
||||
time.sleep(0.5 * (i + 1))
|
||||
data = None
|
||||
try:
|
||||
obj = json.loads(data.decode('utf-8', errors='ignore'))
|
||||
except Exception:
|
||||
obj = {}
|
||||
if not obj.get('comments'):
|
||||
alt_params = {'aid': 1988, 'aweme_id': aweme_id, 'count': count, 'cursor': cursor}
|
||||
alt_url = 'https://www.tiktok.com/api/comment/list/?' + urlencode(alt_params)
|
||||
for i in range(2):
|
||||
try:
|
||||
req = Request(alt_url, headers=headers, method='GET')
|
||||
with urlopen(req, timeout=timeout) as resp:
|
||||
data2 = resp.read()
|
||||
obj2 = json.loads(data2.decode('utf-8', errors='ignore'))
|
||||
if obj2.get('comments'):
|
||||
obj = obj2
|
||||
break
|
||||
except Exception:
|
||||
time.sleep(0.5 * (i + 1))
|
||||
comments = obj.get('comments') or []
|
||||
for c in comments:
|
||||
all_comments.append(c)
|
||||
if isinstance(total_limit, int) and total_limit > 0 and len(all_comments) >= total_limit:
|
||||
break
|
||||
has_more = obj.get('has_more')
|
||||
next_cursor = obj.get('cursor') or obj.get('next_cursor')
|
||||
if has_more in (True, 1) and isinstance(next_cursor, int):
|
||||
cursor = next_cursor
|
||||
continue
|
||||
if comments and isinstance(next_cursor, int):
|
||||
cursor = next_cursor
|
||||
continue
|
||||
break
|
||||
return all_comments
|
||||
|
||||
def fetch_replies(comment_id, aweme_id, file_path, count=20, max_pages=50, timeout=30, total_limit=None):
|
||||
"""分页抓取某条评论的二级回复
|
||||
|
||||
参数:`comment_id/aweme_id` 标识;其他参数同评论抓取。
|
||||
返回:回复对象列表。
|
||||
"""
|
||||
reqs = parse_curl_file(file_path)
|
||||
if not reqs:
|
||||
return []
|
||||
headers = reqs[0]['headers']
|
||||
base = 'https://www.tiktok.com/api/comment/list/reply/'
|
||||
cursor = 0
|
||||
replies = []
|
||||
for _ in range(max_pages):
|
||||
params = {'aid': 1988, 'aweme_id': aweme_id, 'comment_id': comment_id, 'count': count, 'cursor': cursor}
|
||||
url = base + '?' + urlencode(params)
|
||||
data = None
|
||||
for i in range(3):
|
||||
try:
|
||||
req = Request(url, headers=headers, method='GET')
|
||||
with urlopen(req, timeout=timeout) as resp:
|
||||
data = resp.read()
|
||||
break
|
||||
except Exception:
|
||||
time.sleep(0.5 * (i + 1))
|
||||
data = None
|
||||
try:
|
||||
obj = json.loads(data.decode('utf-8', errors='ignore'))
|
||||
except Exception:
|
||||
obj = {}
|
||||
arr = obj.get('comments') or []
|
||||
for r in arr:
|
||||
replies.append(r)
|
||||
if isinstance(total_limit, int) and total_limit > 0 and len(replies) >= total_limit:
|
||||
break
|
||||
has_more = obj.get('has_more')
|
||||
next_cursor = obj.get('cursor')
|
||||
if has_more in (True, 1) and isinstance(next_cursor, int):
|
||||
cursor = next_cursor
|
||||
continue
|
||||
break
|
||||
return replies
|
||||
|
||||
_csv_lock = threading.Lock()
|
||||
_print_lock = threading.Lock()
|
||||
_results_lock = threading.Lock()
|
||||
|
||||
def save_comments_from_links(links, out_path, file_path, count=20, pages=50, timeout=30, reply_count=20, reply_pages=50, total_limit=None, reply_total_limit=None, csv_path=None, workers=None):
|
||||
"""并发从视频链接抓取评论与回复并保存快照
|
||||
|
||||
并发:可选信号量限制;每个链接独立线程抓取;
|
||||
CSV:若提供 `csv_path`,按 `username,text` 追加主评论与回复;
|
||||
输出:写入 `out_path`,结构为 `{'items': [{link,count,comments: [...]}, ...]}`。
|
||||
"""
|
||||
ensure_csv_header(csv_path, ['username', 'text'])
|
||||
results = []
|
||||
sem = None
|
||||
if isinstance(workers, int) and workers > 0:
|
||||
sem = threading.Semaphore(workers)
|
||||
|
||||
def _process(link):
|
||||
if sem:
|
||||
sem.acquire()
|
||||
with _print_lock:
|
||||
print(f"[START] {link}", flush=True)
|
||||
try:
|
||||
cs = fetch_comments_aweme(_extract_aweme_id(link), file_path=file_path, count=count, max_pages=pages, timeout=timeout, total_limit=total_limit, referer=link)
|
||||
enriched = []
|
||||
for c in cs:
|
||||
cid = c.get('cid')
|
||||
if cid:
|
||||
rs = fetch_replies(cid, _extract_aweme_id(link), file_path=file_path, count=reply_count, max_pages=reply_pages, timeout=timeout, total_limit=reply_total_limit)
|
||||
c = dict(c)
|
||||
c['replies'] = rs
|
||||
c['reply_count'] = len(rs)
|
||||
enriched.append(c)
|
||||
try:
|
||||
with _print_lock:
|
||||
print(f"{link} | cid={c.get('cid')} | create_time={c.get('create_time')} | reply_count={c.get('reply_count', 0)} | text={c.get('text')}", flush=True)
|
||||
except Exception:
|
||||
pass
|
||||
if csv_path:
|
||||
u = c.get('user') or {}
|
||||
uname = u.get('unique_id') or u.get('nickname') or u.get('uid') or ''
|
||||
rows = [[uname, c.get('text')]]
|
||||
for r in c.get('replies', []) or []:
|
||||
ru = r.get('user') or {}
|
||||
runame = ru.get('unique_id') or ru.get('nickname') or ru.get('uid') or ''
|
||||
rows.append([runame, r.get('text')])
|
||||
with _csv_lock:
|
||||
append_csv_rows(csv_path, rows)
|
||||
with _results_lock:
|
||||
results.append({'link': link, 'count': len(cs), 'comments': enriched})
|
||||
reply_total = sum(len(c.get('replies') or []) for c in enriched)
|
||||
with _print_lock:
|
||||
print(f"[DONE] {link} comments={len(cs)} replies={reply_total}", flush=True)
|
||||
except Exception as e:
|
||||
with _print_lock:
|
||||
print(f"[ERROR] {link} {e}", flush=True)
|
||||
finally:
|
||||
if sem:
|
||||
sem.release()
|
||||
|
||||
threads = []
|
||||
for link in links:
|
||||
t = threading.Thread(target=_process, args=(link,))
|
||||
t.daemon = True
|
||||
t.start()
|
||||
threads.append(t)
|
||||
for t in threads:
|
||||
t.join()
|
||||
save_comments_snapshot(out_path, results)
|
||||
return out_path
|
||||
"""TikTok 评论与回复抓取模块
|
||||
|
||||
能力:
|
||||
- 根据视频链接提取 aweme_id
|
||||
- 通过评论接口分页拉取评论(支持兜底接口)
|
||||
- 针对每条评论抓取二级回复并汇总
|
||||
- 可选写入 CSV 与打印进度日志
|
||||
"""
|
||||
171
tiktok/search.py
Normal file
171
tiktok/search.py
Normal file
@@ -0,0 +1,171 @@
|
||||
import json
|
||||
import re
|
||||
import threading
|
||||
import time
|
||||
from urllib.parse import urlparse, parse_qs, urlencode, urlunparse
|
||||
from urllib.request import Request, urlopen
|
||||
from core.curl import parse_curl_file
|
||||
from data.store import save_links_snapshot
|
||||
|
||||
def _update_query(url, updates):
|
||||
"""在原始 URL 上用 `updates` 更新查询参数并返回新 URL"""
|
||||
p = urlparse(url)
|
||||
q = parse_qs(p.query)
|
||||
for k, v in updates.items():
|
||||
q[k] = [str(v)]
|
||||
new_q = urlencode(q, doseq=True)
|
||||
return urlunparse((p.scheme, p.netloc, p.path, p.params, new_q, p.fragment))
|
||||
|
||||
def _extract_links(obj):
|
||||
"""从返回对象中提取视频链接
|
||||
|
||||
优先从 `data -> item -> author.uniqueId + item.id` 组合;
|
||||
同时遍历字符串字段,用正则匹配 tiktok 链接作为兜底。
|
||||
返回:链接列表(可能包含重复,外层负责去重)。
|
||||
"""
|
||||
links = []
|
||||
data = obj.get('data') if isinstance(obj, dict) else None
|
||||
if isinstance(data, list):
|
||||
for e in data:
|
||||
if isinstance(e, dict) and e.get('type') == 1 and isinstance(e.get('item'), dict):
|
||||
it = e['item']
|
||||
author = it.get('author') or {}
|
||||
uid = author.get('uniqueId')
|
||||
vid = it.get('id')
|
||||
if uid and vid:
|
||||
links.append(f"https://www.tiktok.com/@{uid}/video/{vid}")
|
||||
patterns = [
|
||||
r"https?://www\.tiktok\.com/[\w@._-]+/video/\d+",
|
||||
r"https?://www\.tiktok\.com/video/\d+",
|
||||
r"https?://vm\.tiktok\.com/[\w-]+",
|
||||
r"https?://vt\.tiktok\.com/[\w-]+",
|
||||
]
|
||||
def rec(x):
|
||||
if isinstance(x, dict):
|
||||
for v in x.values():
|
||||
rec(v)
|
||||
elif isinstance(x, list):
|
||||
for v in x:
|
||||
rec(v)
|
||||
elif isinstance(x, str):
|
||||
s = x
|
||||
for pat in patterns:
|
||||
for m in re.finditer(pat, s):
|
||||
links.append(m.group(0))
|
||||
rec(obj)
|
||||
return links
|
||||
|
||||
def search_video_links(keyword, file_path, max_pages=50, timeout=30, count=None, on_link=None):
|
||||
"""按关键词分页搜索视频链接
|
||||
|
||||
输入:从 `file_path` 的第 2 个 curl 请求获取基准 URL 与头部
|
||||
行为:分页拉取、重试、解析链接;对新链接触发 `on_link` 回调
|
||||
返回:所有发现的链接列表(不去重本地返回,外层统一去重)。
|
||||
"""
|
||||
reqs = parse_curl_file(file_path)
|
||||
if len(reqs) < 2:
|
||||
return []
|
||||
base = reqs[1]
|
||||
headers = base['headers']
|
||||
parsed = urlparse(base['url'])
|
||||
q = parse_qs(parsed.query)
|
||||
if count is None:
|
||||
if 'count' in q:
|
||||
try:
|
||||
count = int(q['count'][0])
|
||||
except Exception:
|
||||
count = 12
|
||||
else:
|
||||
count = 12
|
||||
all_links = []
|
||||
seen = set()
|
||||
offset = 0
|
||||
cursor = None
|
||||
for _ in range(max_pages):
|
||||
params = {'keyword': keyword, 'count': count}
|
||||
if cursor is not None:
|
||||
params['offset'] = cursor
|
||||
else:
|
||||
params['offset'] = offset
|
||||
u = _update_query(base['url'], params)
|
||||
data = None
|
||||
for i in range(3):
|
||||
try:
|
||||
req = Request(u, headers=headers, method='GET')
|
||||
with urlopen(req, timeout=timeout) as resp:
|
||||
data = resp.read()
|
||||
break
|
||||
except Exception:
|
||||
time.sleep(0.5 * (i + 1))
|
||||
data = None
|
||||
try:
|
||||
obj = json.loads(data.decode('utf-8', errors='ignore'))
|
||||
except Exception:
|
||||
obj = {}
|
||||
links = _extract_links(obj)
|
||||
has_more = obj.get('has_more')
|
||||
next_cursor = obj.get('cursor')
|
||||
new = 0
|
||||
for l in links:
|
||||
if l not in seen:
|
||||
seen.add(l)
|
||||
all_links.append(l)
|
||||
new += 1
|
||||
if on_link:
|
||||
try:
|
||||
on_link(l)
|
||||
except Exception:
|
||||
pass
|
||||
if has_more in (True, 1) and isinstance(next_cursor, int):
|
||||
cursor = next_cursor
|
||||
continue
|
||||
if new == 0:
|
||||
break
|
||||
offset += count
|
||||
return all_links
|
||||
|
||||
_print_lock = threading.Lock()
|
||||
|
||||
def save_links_multi(keywords, out_path, file_path, max_pages=50, timeout=30, count=None, workers=5):
|
||||
"""并发按多个关键词搜索并保存快照
|
||||
|
||||
并发:使用线程 + 信号量限制并发;跨关键词统一去重;
|
||||
输出:写入 `out_path`,包含 `keywords/items/total_count/links`。
|
||||
"""
|
||||
all_links = []
|
||||
seen = set()
|
||||
items = []
|
||||
seen_lock = threading.Lock()
|
||||
sem = threading.Semaphore(max(1, int(workers)))
|
||||
|
||||
def worker(kw):
|
||||
with sem:
|
||||
item_links = []
|
||||
def on_new(l):
|
||||
with seen_lock:
|
||||
if l not in seen:
|
||||
seen.add(l)
|
||||
all_links.append(l)
|
||||
item_links.append(l)
|
||||
with _print_lock:
|
||||
print(l, flush=True)
|
||||
search_video_links(kw, file_path=file_path, max_pages=max_pages, timeout=timeout, count=count, on_link=on_new)
|
||||
items.append({'keyword': kw, 'count': len(item_links), 'links': item_links})
|
||||
|
||||
threads = []
|
||||
for kw in keywords:
|
||||
t = threading.Thread(target=worker, args=(kw,))
|
||||
t.daemon = True
|
||||
t.start()
|
||||
threads.append(t)
|
||||
for t in threads:
|
||||
t.join()
|
||||
save_links_snapshot(out_path, keywords, items, all_links)
|
||||
return out_path
|
||||
"""TikTok 视频链接搜索模块
|
||||
|
||||
核心能力:
|
||||
- 构造查询 URL(更新 keyword/offset/count 等参数)
|
||||
- 发起请求并解析返回中的视频链接(结构化 + 正则兜底)
|
||||
- 对多个关键词并发搜索、统一去重与快照保存
|
||||
"""
|
||||
BIN
utils/__pycache__/io.cpython-312.pyc
Normal file
BIN
utils/__pycache__/io.cpython-312.pyc
Normal file
Binary file not shown.
55
utils/filter_comments.py
Normal file
55
utils/filter_comments.py
Normal file
@@ -0,0 +1,55 @@
|
||||
import argparse
|
||||
import csv
|
||||
import os
|
||||
|
||||
def filter_comments(csv_in, csv_out, keywords):
|
||||
ks = set(k.lower() for k in keywords if k)
|
||||
rows_out = []
|
||||
with open(csv_in, 'r', encoding='utf-8', newline='') as f:
|
||||
r = csv.reader(f)
|
||||
first = True
|
||||
for row in r:
|
||||
if first and row and row[0].lower() == 'username':
|
||||
first = False
|
||||
continue
|
||||
first = False
|
||||
if not row:
|
||||
continue
|
||||
text = row[1] if len(row) > 1 else ''
|
||||
s = (text or '').lower()
|
||||
if any(k in s for k in ks):
|
||||
rows_out.append(row)
|
||||
os.makedirs(os.path.dirname(csv_out), exist_ok=True)
|
||||
with open(csv_out, 'w', encoding='utf-8', newline='') as wf:
|
||||
w = csv.writer(wf)
|
||||
w.writerow(['username', 'text'])
|
||||
for r in rows_out:
|
||||
w.writerow(r)
|
||||
print(f"input={csv_in} keywords={len(ks)} matched_rows={len(rows_out)} out={csv_out}")
|
||||
|
||||
def main():
|
||||
p = argparse.ArgumentParser()
|
||||
p.add_argument('--extern-keywords', default=r'd:\work\test\test\all_keywords.txt')
|
||||
p.add_argument('--local-keywords', default=r'data\keyword.txt')
|
||||
p.add_argument('--csv-in', default=r'data\comments.csv')
|
||||
p.add_argument('--csv-out', default=r'data\key_comment.csv')
|
||||
args = p.parse_args()
|
||||
def _load(path):
|
||||
arr = []
|
||||
try:
|
||||
with open(path, 'r', encoding='utf-8') as f:
|
||||
for line in f:
|
||||
s = line.strip()
|
||||
if s:
|
||||
arr.append(s)
|
||||
except Exception:
|
||||
arr = []
|
||||
return arr
|
||||
kws = []
|
||||
kws.extend(_load(args.extern_keywords))
|
||||
kws.extend(_load(args.local_keywords))
|
||||
kws.append('pen')
|
||||
filter_comments(args.csv_in, args.csv_out, kws)
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
54
utils/io.py
Normal file
54
utils/io.py
Normal file
@@ -0,0 +1,54 @@
|
||||
import json
|
||||
import os
|
||||
import csv
|
||||
|
||||
def load_keywords_from_file(path):
|
||||
"""逐行读取关键词文件,忽略空行,返回列表"""
|
||||
arr = []
|
||||
try:
|
||||
with open(path, 'r', encoding='utf-8') as f:
|
||||
for line in f:
|
||||
s = line.strip()
|
||||
if s:
|
||||
arr.append(s)
|
||||
except Exception:
|
||||
arr = []
|
||||
return arr
|
||||
|
||||
def write_json(path, obj):
|
||||
"""以 UTF-8 写入 JSON,使用非 ASCII 保留与缩进"""
|
||||
with open(path, 'w', encoding='utf-8') as f:
|
||||
json.dump(obj, f, ensure_ascii=False, indent=2)
|
||||
|
||||
def read_json(path):
|
||||
"""读取 JSON 文件,失败时返回空对象"""
|
||||
try:
|
||||
with open(path, 'r', encoding='utf-8') as f:
|
||||
return json.load(f)
|
||||
except Exception:
|
||||
return {}
|
||||
|
||||
def ensure_csv_header(path, headers):
|
||||
"""若 CSV 不存在则创建并写入表头;为空路径直接返回"""
|
||||
if not path:
|
||||
return
|
||||
if not os.path.exists(path):
|
||||
with open(path, 'w', newline='', encoding='utf-8') as wf:
|
||||
w = csv.writer(wf)
|
||||
w.writerow(headers)
|
||||
|
||||
def append_csv_rows(path, rows):
|
||||
"""向 CSV 追加多行,行元素按列表给出;为空路径直接返回"""
|
||||
if not path:
|
||||
return
|
||||
with open(path, 'a', newline='', encoding='utf-8') as af:
|
||||
w = csv.writer(af)
|
||||
for r in rows:
|
||||
w.writerow(r)
|
||||
"""通用 IO 工具
|
||||
|
||||
提供:
|
||||
- 关键词文件加载
|
||||
- JSON 读写
|
||||
- CSV 文件写入(确保表头、追加行)
|
||||
"""
|
||||
Reference in New Issue
Block a user