init commit
This commit is contained in:
194
README.md
Normal file
194
README.md
Normal file
@@ -0,0 +1,194 @@
|
|||||||
|
# crawler_tiktok
|
||||||
|
|
||||||
|
TikTok 数据抓取脚本,分为两阶段:
|
||||||
|
- 按关键词搜索视频链接并生成快照(`links`)
|
||||||
|
- 根据视频链接抓取评论与二级回复并生成快照与可选 CSV(`comments`)
|
||||||
|
|
||||||
|
全项目基于 Python 标准库实现(`urllib`、`threading` 等),不依赖第三方包。
|
||||||
|
|
||||||
|
## 目录结构
|
||||||
|
|
||||||
|
```
|
||||||
|
crawler_tiktok/
|
||||||
|
├─ core/ # cURL 文本解析与请求发送
|
||||||
|
│ └─ curl.py
|
||||||
|
├─ tiktok/ # TikTok 业务逻辑
|
||||||
|
│ ├─ search.py # 关键词搜索视频链接
|
||||||
|
│ └─ comments.py # 抓取评论与二级回复
|
||||||
|
├─ data/ # 示例数据与输出目录
|
||||||
|
│ ├─ 1.text # cURL 文本(包含多个 curl 命令块)
|
||||||
|
│ ├─ keyword.txt # 关键词文件(每行一个关键词)
|
||||||
|
│ ├─ urls.json # 链接搜索快照输出(示例已有)
|
||||||
|
│ ├─ comments.csv # 评论 CSV 输出(可选)
|
||||||
|
│ └─ store.py # 统一的快照写入工具
|
||||||
|
├─ utils/ # 通用 IO 工具
|
||||||
|
│ └─ io.py
|
||||||
|
├─ main.py # 命令行入口(子命令:links / comments / all)
|
||||||
|
└─ __init__.py # 包入口
|
||||||
|
```
|
||||||
|
|
||||||
|
## 准备工作
|
||||||
|
|
||||||
|
- 安装 Python(建议 3.8+)
|
||||||
|
- 准备 `data/1.text`:
|
||||||
|
- 打开浏览器访问 TikTok,登录后在开发者工具的 Network 面板选中相关请求,使用 “Copy as cURL” 复制。
|
||||||
|
- 将“评论接口”的 `curl ...` 放在第一段,“搜索接口”的 `curl ...` 放在第二段;两段之间可直接换行即可。
|
||||||
|
- 保留请求头(尤其是 `cookie`)以便接口正常返回。
|
||||||
|
- 准备关键词文件 `data/keyword.txt`(每行一个关键词),或使用命令行传参。
|
||||||
|
|
||||||
|
## 快速开始
|
||||||
|
|
||||||
|
在仓库根目录(必须为 `D:\work\crawler_tiktok`)直接运行脚本:
|
||||||
|
|
||||||
|
```
|
||||||
|
python main.py -h
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
### 1) 搜索视频链接(links)
|
||||||
|
|
||||||
|
将关键词并发搜索,统一去重并保存到 `urls.json`。
|
||||||
|
|
||||||
|
```
|
||||||
|
python main.py links \
|
||||||
|
--keywords-file data\keyword.txt \
|
||||||
|
--file-path data\1.text \
|
||||||
|
--out data\urls.json \
|
||||||
|
--max-pages 50 \
|
||||||
|
--count 12 \
|
||||||
|
--timeout 30 \
|
||||||
|
--workers 5
|
||||||
|
```
|
||||||
|
|
||||||
|
可选:
|
||||||
|
- 通过 `--keyword` 重复传入多个关键词(可与 `--keywords-file` 混用)
|
||||||
|
- `--keywords` 逗号分隔的关键词字符串
|
||||||
|
|
||||||
|
输出 `urls.json` 结构示例:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"keywords": ["xxx", "yyy"],
|
||||||
|
"items": [
|
||||||
|
{"keyword": "xxx", "count": 10, "links": ["https://www.tiktok.com/@user/video/123" ...]},
|
||||||
|
{"keyword": "yyy", "count": 8, "links": [ ... ]}
|
||||||
|
],
|
||||||
|
"total_count": 17855,
|
||||||
|
"links": ["https://www.tiktok.com/@user/video/123", ...]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2) 抓取评论与回复(comments)
|
||||||
|
|
||||||
|
从链接快照读取链接,抓取主评论与二级回复,并保存 JSON 与可选 CSV。
|
||||||
|
|
||||||
|
```
|
||||||
|
python main.py comments \
|
||||||
|
--links-json data\urls.json \
|
||||||
|
--out data\tik_comments.json \
|
||||||
|
--file-path data\1.text \
|
||||||
|
--count 100 \
|
||||||
|
--pages 100 \
|
||||||
|
--timeout 30 \
|
||||||
|
--reply-count 100 \
|
||||||
|
--reply-pages 100 \
|
||||||
|
--csv data\comments.csv \
|
||||||
|
--workers 8
|
||||||
|
```
|
||||||
|
|
||||||
|
输出 `tik_comments.json` 结构示例:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"items": [
|
||||||
|
{
|
||||||
|
"link": "https://www.tiktok.com/@user/video/123",
|
||||||
|
"count": 42,
|
||||||
|
"comments": [
|
||||||
|
{
|
||||||
|
"cid": "xxx",
|
||||||
|
"text": "...",
|
||||||
|
"user": {"unique_id": "..."},
|
||||||
|
"replies": [{"text": "..."}, ...],
|
||||||
|
"reply_count": 3
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
若提供 `--csv`,会将主评论与回复分别以 `username,text` 形式追加到该文件。
|
||||||
|
|
||||||
|
### 3) 全流程一体化(all)
|
||||||
|
|
||||||
|
一次性串联链接搜索与评论抓取,适合流水线执行:
|
||||||
|
|
||||||
|
```
|
||||||
|
python main.py all \
|
||||||
|
--keywords-file data\keyword.txt \
|
||||||
|
--file-path data\1.text \
|
||||||
|
--links-out data\urls.json \
|
||||||
|
--search-max-pages 50 \
|
||||||
|
--search-count 12 \
|
||||||
|
--search-timeout 30 \
|
||||||
|
--search-workers 5 \
|
||||||
|
--comments-out data\tik_comments.json \
|
||||||
|
--comments-count 100 \
|
||||||
|
--comments-pages 100 \
|
||||||
|
--comments-timeout 30 \
|
||||||
|
--comments-limit 1000 \
|
||||||
|
--reply-count 100 \
|
||||||
|
--reply-pages 100 \
|
||||||
|
--reply-limit 2000 \
|
||||||
|
--csv data\comments.csv \
|
||||||
|
--comments-workers 8
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4) 写入 MySQL(从 CSV 导入)
|
||||||
|
|
||||||
|
在 `D:\work\crawler_tiktok` 下执行:
|
||||||
|
|
||||||
|
```
|
||||||
|
pip install pymysql
|
||||||
|
python main.py mysql \
|
||||||
|
--csv data\comments.csv \
|
||||||
|
--host localhost \
|
||||||
|
--port 3306 \
|
||||||
|
--user root \
|
||||||
|
--password <你的密码> \
|
||||||
|
--database crawler_tiktok \
|
||||||
|
--table comments
|
||||||
|
```
|
||||||
|
|
||||||
|
若数据库不存在,请先在 MySQL 中创建:
|
||||||
|
|
||||||
|
```
|
||||||
|
CREATE DATABASE IF NOT EXISTS `crawler_tiktok` DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
|
||||||
|
```
|
||||||
|
|
||||||
|
导入会在指定库中自动建表(如不存在),并批量插入 `username,text` 两列数据。
|
||||||
|
|
||||||
|
## 重要参数说明
|
||||||
|
|
||||||
|
- `--keyword / --keywords / --keywords-file`:三种方式提供关键词,最终会合并并去重。
|
||||||
|
- `--file-path`:cURL 文本文件路径(包含多个 `curl ...` 命令块)。
|
||||||
|
- 第 1 块作为评论接口基准。
|
||||||
|
- 第 2 块作为搜索接口基准。
|
||||||
|
- 搜索阶段:`--max-pages` 分页轮次上限;`--count` 每页条数(默认从 URL 中推断,通常为 12);`--workers` 并发线程数。
|
||||||
|
- 评论阶段:`--pages` 评论分页上限;`--count` 每页评论数;`--reply-count` / `--reply-pages` 回复分页与每页数;`--workers` 并发抓取线程数。
|
||||||
|
- `--timeout`:请求超时秒数。
|
||||||
|
- `--csv`:若提供则会将主评论与回复按 `username,text` 追加到该 CSV。
|
||||||
|
|
||||||
|
## 输出文件约定
|
||||||
|
|
||||||
|
- `data/urls.json`:链接搜索快照,包含 `keywords/items/total_count/links`。
|
||||||
|
- `data/tik_comments.json`:评论抓取快照,包含 `items`(每项含 `link/count/comments`)。
|
||||||
|
- `data/comments.csv`:CSV 格式的评论与回复(用户名、文本)。
|
||||||
|
|
||||||
|
## 常见问题
|
||||||
|
|
||||||
|
- 返回为空或报错:优先检查 `data/1.text` 的 cURL 是否有效,`cookie` 是否过期。
|
||||||
|
- 速率限制:适当降低 `--workers`、提高 `--timeout`,或分批执行。
|
||||||
|
- Windows 路径:示例中使用反斜杠;若在类 Unix 环境,改为 `/`。
|
||||||
|
- 进度打印:抓取过程会打印 START/DONE/ERROR 以及评论统计,便于观察执行状态。
|
||||||
10
__init__.py
Normal file
10
__init__.py
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
"""crawler_tiktok 包
|
||||||
|
|
||||||
|
该包用于从 TikTok 搜集视频链接并抓取评论与回复。
|
||||||
|
模块划分:
|
||||||
|
- core:基础能力(如从 curl 文本解析 URL 与请求头)
|
||||||
|
- tiktok:与 TikTok 相关的抓取逻辑(搜索、评论)
|
||||||
|
- utils:通用 IO 工具(JSON/CSV 读写、关键词文件加载)
|
||||||
|
- data:数据快照的写入工具(store),以及示例数据文件
|
||||||
|
入口:main.py 提供命令行子命令 links/comments/all。
|
||||||
|
"""
|
||||||
BIN
__pycache__/__init__.cpython-312.pyc
Normal file
BIN
__pycache__/__init__.cpython-312.pyc
Normal file
Binary file not shown.
BIN
__pycache__/main.cpython-312.pyc
Normal file
BIN
__pycache__/main.cpython-312.pyc
Normal file
Binary file not shown.
BIN
core/__pycache__/curl.cpython-312.pyc
Normal file
BIN
core/__pycache__/curl.cpython-312.pyc
Normal file
Binary file not shown.
BIN
core/__pycache__/har.cpython-312.pyc
Normal file
BIN
core/__pycache__/har.cpython-312.pyc
Normal file
Binary file not shown.
BIN
core/__pycache__/store.cpython-312.pyc
Normal file
BIN
core/__pycache__/store.cpython-312.pyc
Normal file
Binary file not shown.
74
core/curl.py
Normal file
74
core/curl.py
Normal file
@@ -0,0 +1,74 @@
|
|||||||
|
import re
|
||||||
|
import json
|
||||||
|
from urllib.request import Request, urlopen
|
||||||
|
|
||||||
|
def _split_curl_blocks(text):
|
||||||
|
"""按出现的 `curl ` 关键字切分文本为多个命令块"""
|
||||||
|
blocks = []
|
||||||
|
indices = [m.start() for m in re.finditer(r"\bcurl\s", text)]
|
||||||
|
if not indices:
|
||||||
|
return blocks
|
||||||
|
for i, start in enumerate(indices):
|
||||||
|
end = indices[i + 1] if i + 1 < len(indices) else len(text)
|
||||||
|
blocks.append(text[start:end])
|
||||||
|
return blocks
|
||||||
|
|
||||||
|
def _parse_block(block):
|
||||||
|
"""从单个 curl 命令块中解析 URL 与头部
|
||||||
|
|
||||||
|
返回:`{'url': str, 'headers': dict}`,若无法解析 URL 返回 None
|
||||||
|
"""
|
||||||
|
url_m = re.search(r"curl\s+['\"](.*?)['\"]", block, re.S)
|
||||||
|
if not url_m:
|
||||||
|
return None
|
||||||
|
url = url_m.group(1)
|
||||||
|
headers = {}
|
||||||
|
for hm in re.finditer(r"-H\s+['\"]([^:]+):\s*(.*?)['\"]", block):
|
||||||
|
k = hm.group(1).strip()
|
||||||
|
v = hm.group(2).strip()
|
||||||
|
headers[k.lower()] = v
|
||||||
|
cm = re.search(r"-b\s+['\"](.*?)['\"]", block, re.S)
|
||||||
|
if cm:
|
||||||
|
headers['cookie'] = cm.group(1)
|
||||||
|
return {'url': url, 'headers': headers}
|
||||||
|
|
||||||
|
def parse_curl_file(file_path):
|
||||||
|
"""读取 curl 文本文件并解析为请求描述列表
|
||||||
|
|
||||||
|
参数:`file_path` 文件路径
|
||||||
|
返回:列表,每项包含 `url` 与 `headers`
|
||||||
|
"""
|
||||||
|
with open(file_path, 'r', encoding='utf-8') as f:
|
||||||
|
text = f.read()
|
||||||
|
blocks = _split_curl_blocks(text)
|
||||||
|
result = []
|
||||||
|
for b in blocks:
|
||||||
|
parsed = _parse_block(b)
|
||||||
|
if parsed:
|
||||||
|
result.append(parsed)
|
||||||
|
return result
|
||||||
|
|
||||||
|
def fetch_from_curl(file_path, index=0, timeout=30):
|
||||||
|
"""按索引选取解析出的请求并发起 GET
|
||||||
|
|
||||||
|
参数:`index` 为第几个 curl 块;`timeout` 请求超时秒数
|
||||||
|
返回:尝试解析为 JSON,失败则返回原始 bytes
|
||||||
|
"""
|
||||||
|
reqs = parse_curl_file(file_path)
|
||||||
|
if not reqs or index < 0 or index >= len(reqs):
|
||||||
|
return None
|
||||||
|
item = reqs[index]
|
||||||
|
req = Request(item['url'], headers=item['headers'], method='GET')
|
||||||
|
with urlopen(req, timeout=timeout) as resp:
|
||||||
|
data = resp.read()
|
||||||
|
try:
|
||||||
|
return json.loads(data.decode('utf-8', errors='ignore'))
|
||||||
|
except Exception:
|
||||||
|
return data
|
||||||
|
"""curl 文本解析与请求发送工具
|
||||||
|
|
||||||
|
职责:
|
||||||
|
- 将包含多个 curl 命令的文本切分为块
|
||||||
|
- 从每个块解析 URL 与请求头(含 Cookie)
|
||||||
|
- 基于解析结果发起 GET 请求并尝试返回 JSON
|
||||||
|
"""
|
||||||
28
data/1.text
Normal file
28
data/1.text
Normal file
@@ -0,0 +1,28 @@
|
|||||||
|
curl 'https://www.tiktok.com/api/comment/list/?WebIdLastTime=1762843898&aid=1988&app_language=en&app_name=tiktok_web&aweme_id=7554313767425985806&browser_language=zh-CN&browser_name=Mozilla&browser_online=true&browser_platform=Win32&browser_version=5.0%20%28Windows%20NT%2010.0%3B%20Win64%3B%20x64%29%20AppleWebKit%2F537.36%20%28KHTML%2C%20like%20Gecko%29%20Chrome%2F142.0.0.0%20Safari%2F537.36&channel=tiktok_web&cookie_enabled=true&count=20&cursor=0&data_collection_enabled=true&device_id=7571356851431851534&device_platform=web_pc&focus_state=true&from_page=video&history_len=4&is_fullscreen=false&is_page_visible=true&odinId=7571357724144124941&os=windows&priority_region=US&referer=https%3A%2F%2Fwww.google.com%2F®ion=US&root_referer=https%3A%2F%2Fwww.google.com%2F&screen_height=1080&screen_width=1920&tz_name=Asia%2FShanghai&user_is_login=true&verifyFp=verify_mi5rex4u_QQ3WUuF2_qkrc_4zFV_9lPB_6ZiBsQO9Yg1Y&webcast_language=en&msToken=HGxJ50IHbcJxidsAy4biksW4jCfUpjG5IOfoNZd9m24WyL0muFz7f02qUT2A4HCKPQPheRtCr66460XMCJQ9mCplXR1zk1fKK81mU65TLKdczaVqauDay_1qAol348Cg_iQaiK74qPZ0EoJBKu_iZbCATA==&X-Bogus=DFSzsIVuMqxANCCACObDM-ZLJrPe&X-Gnarly=MxLz6L1B1jRWAkdMAhXfoRcZW7o7E89jQUXQepysu7jSC47hCDAgLaFj6ATg13br-ct2WppjvVuo3DrB5foJoo3XOjJH6TVzfVkPLs8Sw47ja/0uC5DB6DPtfPWekO9g9-ZviZeREnpG/N2SRXbqDr0-Go5o0OzoRp9wdRUoLSAM5nbo0niphLjDxyOzdsW/RqxqQNFbBnJJkNqIH4TiXbQmNafqX1Yk5cGaIFH7FHcjZsYRbtA2gc8cXePp2guxR5cXDepaBF2Wgsdmu8VM2q8ed8o7ohQXi56GiUuUXjLJJ130mlhYHJHgYHtxYynbeRw=' \
|
||||||
|
-H 'accept: */*' \
|
||||||
|
-H 'accept-language: zh-CN,zh;q=0.9' \
|
||||||
|
-b 'passport_csrf_token=fc48fa188f4d67baad733476f64baccf; passport_csrf_token_default=fc48fa188f4d67baad733476f64baccf; last_login_method=email; delay_guest_mode_vid=5; living_user_id=480687482446; tiktok_webapp_theme_source=auto; tt_chain_token=tTxp2ztaIaxTaXsGrJb+8Q==; tiktok_webapp_lang=zh-Hans; d_ticket=64bffcf490d4c9c7839c94bfa06f09cf36a43; ttwid=1%7C28VHivouaKIrGON9d-ZudJgYZUKezdC-9xuqts9saLQ%7C1763362651%7C2cd9c33c1a5733387a276a04d1af0af2026a80003217b55a841cb13520f1347f; myCookie=rap; fblo_1862952583919182=y; tiktok_webapp_theme=light; multi_sids=7571357724144124941%3Ab0685b23eb2eeb5f5e0a5604801f365b; cmpl_token=AgQQAPNSF-RO0rksMxu3N50083NFwqLP_4zZYKPv8w; sid_guard=b0685b23eb2eeb5f5e0a5604801f365b%7C1763435980%7C15552000%7CSun%2C+17-May-2026+03%3A19%3A40+GMT; uid_tt=b8d8d5d8b062bd2f3badb4e51420dc711f3305fec2a664f03306a64342912e7e; uid_tt_ss=b8d8d5d8b062bd2f3badb4e51420dc711f3305fec2a664f03306a64342912e7e; sid_tt=b0685b23eb2eeb5f5e0a5604801f365b; sessionid=b0685b23eb2eeb5f5e0a5604801f365b; sessionid_ss=b0685b23eb2eeb5f5e0a5604801f365b; tt_session_tlb_tag=sttt%7C1%7CsGhbI-su619eClYEgB82W__________pPerzW7vBXrU5jjfVhTeyC35OugALhjpAGeyHj7inp30%3D; sid_ucp_v1=1.0.1-KDVjMGU0Y2Q1OTg2ZjI0ODI2MjI2NWFmZDY0YWYyMmE0ZTkzY2M5M2QKIgiNiJX4w7e3iWkQzMvvyAYYswsgDDDgu8vIBjgBQOoHSAQQBBoHdXNlYXN0NSIgYjA2ODViMjNlYjJlZWI1ZjVlMGE1NjA0ODAxZjM2NWIyTgogV1B2-bFMjyKUv3NQtyvtMGmk7ui-Sl9aVTQStE9c4gkSIMJacGh0QuV1QyQEcXmFWpSTFvdWSbfDoe3vFgrRtb2YGAEiBnRpa3Rvaw; ssid_ucp_v1=1.0.1-KDVjMGU0Y2Q1OTg2ZjI0ODI2MjI2NWFmZDY0YWYyMmE0ZTkzY2M5M2QKIgiNiJX4w7e3iWkQzMvvyAYYswsgDDDgu8vIBjgBQOoHSAQQBBoHdXNlYXN0NSIgYjA2ODViMjNlYjJlZWI1ZjVlMGE1NjA0ODAxZjM2NWIyTgogV1B2-bFMjyKUv3NQtyvtMGmk7ui-Sl9aVTQStE9c4gkSIMJacGh0QuV1QyQEcXmFWpSTFvdWSbfDoe3vFgrRtb2YGAEiBnRpa3Rvaw; store-idc=useast5; store-country-code=us; store-country-code-src=uid; tt-target-idc=useast8; tt-target-idc-sign=jh5EgVgJzs2Zhfd2qcTVn1799rd4vK_dkjNT5E1hkY8ey6Iuuuo2qPfonmOJN-73-SEUPvtKH8L04xVHFdfuxGD4jEaT9iJGdE634_1n9RDR1aV8X7xR36LWBtgYwCfK96M28ozQElXgVDFsHS5jNIH0Jfq5gaisBGFWCAz7zEHd3YwWFjSVW96udWsQjGHM_y0UYLKmGEwmNh3nCmKOGntgfvFHrzuxfYL2T6upJ8x8WMb3GG-tGXKw4N05kaWH3LJCY3hGiCOIBX3s8_n0jvn1PLu9yiOUiF2f-K63HqcnVxPsuYfg8iYE47R08TALOuvQE9CFArejv2TMFIjiGKi-MB3BlWwwCUmUSbTyoTLBzSuKT5Elynh0l1JVMxrXlqZn39OMcs8_AB0n_RyyAF3FH9pV0sQ2lza6iJtZim0hmnAuTn8C26lyuss0BP3vlSPtp26rNMyqs1uZlpIclUzI7hmlBV6HNph2l1oBp7QMbvCQLFIGYMJXTL5kP_kX; tt_csrf_token=2alNSmcj-CbgYZfsy7TgGoLanfvaF_S5Ya4M; passport_fe_beating_status=true; ttwid=1%7C28VHivouaKIrGON9d-ZudJgYZUKezdC-9xuqts9saLQ%7C1763542066%7C310b419382cc569f618a86d012ca09f9c470719d9734b7ef5e637af0851cc2fc; odin_tt=081974ccd8c05aa99c9def51a2ae7bff2f2eff4b7187eaefc9035f6f29b3cf23b9e7e874eddd540c813b1aeacd524755eeea3b487b1604ab69034829839420a35eb841c2c1660309fb2e8733f8cec76e; store-country-sign=MEIEDEmOSATz1HVkE69LLQQgQ-xv4GcuilURIjdWweEXq86fX22G46h6wFIvL1YxVVUEEOiSOHWEhgwxBRqtF3ZvKh4; s_v_web_id=verify_mi5rex4u_QQ3WUuF2_qkrc_4zFV_9lPB_6ZiBsQO9Yg1Y; msToken=HGxJ50IHbcJxidsAy4biksW4jCfUpjG5IOfoNZd9m24WyL0muFz7f02qUT2A4HCKPQPheRtCr66460XMCJQ9mCplXR1zk1fKK81mU65TLKdczaVqauDay_1qAol348Cg_iQaiK74qPZ0EoJBKu_iZbCATA==; msToken=jWvit9euZoPFKgYZamkCxcNRJFbxE2efJZTwjJRsUOFUCqfJXFXMDMp6zFbk7VT7czHFPTMr0hdTmSdJybWT0mwnHYYrra7EQWKq9cwTr0NAdoyzkp-xrBuSlxD6a4Fcai6a72-qqTi1FxkAWjmrliiqwg==' \
|
||||||
|
-H 'priority: u=1, i' \
|
||||||
|
-H 'referer: https://www.tiktok.com/@user73001399001191/video/7561810082636287246' \
|
||||||
|
-H 'sec-ch-ua: "Chromium";v="142", "Google Chrome";v="142", "Not_A Brand";v="99"' \
|
||||||
|
-H 'sec-ch-ua-mobile: ?0' \
|
||||||
|
-H 'sec-ch-ua-platform: "Windows"' \
|
||||||
|
-H 'sec-fetch-dest: empty' \
|
||||||
|
-H 'sec-fetch-mode: cors' \
|
||||||
|
-H 'sec-fetch-site: same-origin' \
|
||||||
|
-H 'user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36'
|
||||||
|
|
||||||
|
|
||||||
|
curl 'https://www.tiktok.com/api/search/general/full/?WebIdLastTime=1762843898&aid=1988&app_language=en&app_name=tiktok_web&browser_language=zh-CN&browser_name=Mozilla&browser_online=true&browser_platform=Win32&browser_version=5.0%20%28Windows%20NT%2010.0%3B%20Win64%3B%20x64%29%20AppleWebKit%2F537.36%20%28KHTML%2C%20like%20Gecko%29%20Chrome%2F142.0.0.0%20Safari%2F537.36&channel=tiktok_web&client_ab_versions=70508271%2C72437276%2C73547759%2C73720540%2C74444736%2C74446915%2C74465410%2C74627577%2C74679798%2C74703728%2C74744616%2C74746519%2C74757744%2C74780477%2C74782564%2C74793838%2C74798355%2C74803471%2C74808328%2C74824020%2C74843467%2C74852654%2C74860161%2C74879745%2C74879783%2C74882809%2C74891662%2C74902367%2C74926160%2C74928117%2C74935708%2C74936938%2C74970253%2C74972148%2C74973673%2C74976255%2C74980175%2C74983940%2C74994853%2C75001423%2C75005876%2C70405643%2C70772958%2C71057832%2C71200802%2C71381811%2C71516509%2C71803300%2C71962127%2C72360691%2C72361743%2C72408100%2C72854054%2C72892778%2C73171280%2C73208420%2C73989921%2C74276218%2C74611443%2C74844724&cookie_enabled=true&count=16&data_collection_enabled=true&device_id=7571356851431851534&device_platform=web_pc&device_type=web_h265&focus_state=true&from_page=search&history_len=5&is_fullscreen=false&is_page_visible=true&keyword=%E6%B6%82%E9%B8%A6%E7%BB%98%E7%94%BB&odinId=7571357724144124941&offset=0&os=windows&priority_region=US&referer=https%3A%2F%2Fwww.google.com%2F®ion=US&root_referer=https%3A%2F%2Fwww.google.com%2F&screen_height=1080&screen_width=1920&search_source=search_history&tz_name=Asia%2FShanghai&user_is_login=true&verifyFp=verify_mi5rex4u_QQ3WUuF2_qkrc_4zFV_9lPB_6ZiBsQO9Yg1Y&web_search_code=%7B%22tiktok%22%3A%7B%22client_params_x%22%3A%7B%22search_engine%22%3A%7B%22ies_mt_user_live_video_card_use_libra%22%3A1%2C%22mt_search_general_user_live_card%22%3A1%7D%7D%2C%22search_server%22%3A%7B%7D%7D%7D&webcast_language=en&msToken=Evo6YZn35dd6dAsaNUBm7WHaOCixR84Hwjo6DVlNBGE0L56xiDF_dmDWfyJIJGq8LDEsjNm5G9H3uMP9LlVsCunVwx0lMEnriQWWWuzpN7Xp4j0Fj5wXbgMEqU9KMd5YfkZ1iqFubWhu99nvT06p5qpUeg==&X-Bogus=DFSzsIVu7-iANCCACObDh-ZLJrOb&X-Gnarly=M5J8rVZ10jjW3H5JvTrrLC6MGn7Qq4X0NFfuLZ1UYP2F5Tyem4CUCigEriTnj4Ui3kZdlhNogxKstfzoLHWeSKWiubEdsYZpiegkx-Ot2OUSwbyC9mcwB8T80j7nJpzf6tMOisjjbGiGzYQDJuNrqgxehrCDKUfdA6CbLeoguoGy7XQjTDxmg3/VsSqdhziaenBlm72xVj0GyLUEgrboEwzp11Xphma3Qo8b-/uiMZWQDNyJaC7rcb11dW-ffpSTMrXvf6EU6QXJav2NYvS2gNjMJBhMf15s0-NNQQIC-USLgeAWEo5Wj-gXgn/YmaUd7hZ=' \
|
||||||
|
-H 'accept: */*' \
|
||||||
|
-H 'accept-language: zh-CN,zh;q=0.9' \
|
||||||
|
-b 'passport_csrf_token=fc48fa188f4d67baad733476f64baccf; passport_csrf_token_default=fc48fa188f4d67baad733476f64baccf; last_login_method=email; delay_guest_mode_vid=5; living_user_id=480687482446; tiktok_webapp_theme_source=auto; tt_chain_token=tTxp2ztaIaxTaXsGrJb+8Q==; tiktok_webapp_lang=zh-Hans; d_ticket=64bffcf490d4c9c7839c94bfa06f09cf36a43; ttwid=1%7C28VHivouaKIrGON9d-ZudJgYZUKezdC-9xuqts9saLQ%7C1763362651%7C2cd9c33c1a5733387a276a04d1af0af2026a80003217b55a841cb13520f1347f; myCookie=rap; fblo_1862952583919182=y; tiktok_webapp_theme=light; multi_sids=7571357724144124941%3Ab0685b23eb2eeb5f5e0a5604801f365b; cmpl_token=AgQQAPNSF-RO0rksMxu3N50083NFwqLP_4zZYKPv8w; sid_guard=b0685b23eb2eeb5f5e0a5604801f365b%7C1763435980%7C15552000%7CSun%2C+17-May-2026+03%3A19%3A40+GMT; uid_tt=b8d8d5d8b062bd2f3badb4e51420dc711f3305fec2a664f03306a64342912e7e; uid_tt_ss=b8d8d5d8b062bd2f3badb4e51420dc711f3305fec2a664f03306a64342912e7e; sid_tt=b0685b23eb2eeb5f5e0a5604801f365b; sessionid=b0685b23eb2eeb5f5e0a5604801f365b; sessionid_ss=b0685b23eb2eeb5f5e0a5604801f365b; tt_session_tlb_tag=sttt%7C1%7CsGhbI-su619eClYEgB82W__________pPerzW7vBXrU5jjfVhTeyC35OugALhjpAGeyHj7inp30%3D; sid_ucp_v1=1.0.1-KDVjMGU0Y2Q1OTg2ZjI0ODI2MjI2NWFmZDY0YWYyMmE0ZTkzY2M5M2QKIgiNiJX4w7e3iWkQzMvvyAYYswsgDDDgu8vIBjgBQOoHSAQQBBoHdXNlYXN0NSIgYjA2ODViMjNlYjJlZWI1ZjVlMGE1NjA0ODAxZjM2NWIyTgogV1B2-bFMjyKUv3NQtyvtMGmk7ui-Sl9aVTQStE9c4gkSIMJacGh0QuV1QyQEcXmFWpSTFvdWSbfDoe3vFgrRtb2YGAEiBnRpa3Rvaw; ssid_ucp_v1=1.0.1-KDVjMGU0Y2Q1OTg2ZjI0ODI2MjI2NWFmZDY0YWYyMmE0ZTkzY2M5M2QKIgiNiJX4w7e3iWkQzMvvyAYYswsgDDDgu8vIBjgBQOoHSAQQBBoHdXNlYXN0NSIgYjA2ODViMjNlYjJlZWI1ZjVlMGE1NjA0ODAxZjM2NWIyTgogV1B2-bFMjyKUv3NQtyvtMGmk7ui-Sl9aVTQStE9c4gkSIMJacGh0QuV1QyQEcXmFWpSTFvdWSbfDoe3vFgrRtb2YGAEiBnRpa3Rvaw; store-idc=useast5; store-country-code=us; store-country-code-src=uid; tt-target-idc=useast8; tt-target-idc-sign=jh5EgVgJzs2Zhfd2qcTVn1799rd4vK_dkjNT5E1hkY8ey6Iuuuo2qPfonmOJN-73-SEUPvtKH8L04xVHFdfuxGD4jEaT9iJGdE634_1n9RDR1aV8X7xR36LWBtgYwCfK96M28ozQElXgVDFsHS5jNIH0Jfq5gaisBGFWCAz7zEHd3YwWFjSVW96udWsQjGHM_y0UYLKmGEwmNh3nCmKOGntgfvFHrzuxfYL2T6upJ8x8WMb3GG-tGXKw4N05kaWH3LJCY3hGiCOIBX3s8_n0jvn1PLu9yiOUiF2f-K63HqcnVxPsuYfg8iYE47R08TALOuvQE9CFArejv2TMFIjiGKi-MB3BlWwwCUmUSbTyoTLBzSuKT5Elynh0l1JVMxrXlqZn39OMcs8_AB0n_RyyAF3FH9pV0sQ2lza6iJtZim0hmnAuTn8C26lyuss0BP3vlSPtp26rNMyqs1uZlpIclUzI7hmlBV6HNph2l1oBp7QMbvCQLFIGYMJXTL5kP_kX; tt_csrf_token=2alNSmcj-CbgYZfsy7TgGoLanfvaF_S5Ya4M; passport_fe_beating_status=true; ttwid=1%7C28VHivouaKIrGON9d-ZudJgYZUKezdC-9xuqts9saLQ%7C1763542066%7C310b419382cc569f618a86d012ca09f9c470719d9734b7ef5e637af0851cc2fc; odin_tt=081974ccd8c05aa99c9def51a2ae7bff2f2eff4b7187eaefc9035f6f29b3cf23b9e7e874eddd540c813b1aeacd524755eeea3b487b1604ab69034829839420a35eb841c2c1660309fb2e8733f8cec76e; store-country-sign=MEIEDEmOSATz1HVkE69LLQQgQ-xv4GcuilURIjdWweEXq86fX22G46h6wFIvL1YxVVUEEOiSOHWEhgwxBRqtF3ZvKh4; s_v_web_id=verify_mi5rex4u_QQ3WUuF2_qkrc_4zFV_9lPB_6ZiBsQO9Yg1Y; msToken=imJTbrm-SxNJWwT4U4KOTLwsN5UnjQQzm-bHzXVVdiKnymtLbyQXbM_dziPgdrG3rcvbHJf_WWoBIuJ8AegguxYLz_gA0dQNa8suc7aNA3RA_Z7FqCUJd7iO4TJX5lU3dG0Ahchd28z0ip0HADyOygq2sQ==; msToken=Na9bB8_PXmSHmEprxHatJ3iAi6DYMS4DGaKymzxCSv5ho-vkxkGi2Oh4LHpL3LntQloywKO0p5gTBoGjA3BKW7uguLGHfS6FiTPzo5JkbwAMPkYyGdaoQh3yikWSJFGPNMpdrwaN8-ta5IcfWZlfv_QQcQ==' \
|
||||||
|
-H 'priority: u=1, i' \
|
||||||
|
-H 'referer: https://www.tiktok.com/search?q=%E6%B6%82%E9%B8%A6%E7%BB%98%E7%94%BB&t=1763542137936' \
|
||||||
|
-H 'sec-ch-ua: "Chromium";v="142", "Google Chrome";v="142", "Not_A Brand";v="99"' \
|
||||||
|
-H 'sec-ch-ua-mobile: ?0' \
|
||||||
|
-H 'sec-ch-ua-platform: "Windows"' \
|
||||||
|
-H 'sec-fetch-dest: empty' \
|
||||||
|
-H 'sec-fetch-mode: cors' \
|
||||||
|
-H 'sec-fetch-site: same-origin' \
|
||||||
|
-H 'user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36'
|
||||||
BIN
data/__pycache__/store.cpython-312.pyc
Normal file
BIN
data/__pycache__/store.cpython-312.pyc
Normal file
Binary file not shown.
88957
data/comments.csv
Normal file
88957
data/comments.csv
Normal file
File diff suppressed because one or more lines are too long
5548
data/key_comment.csv
Normal file
5548
data/key_comment.csv
Normal file
File diff suppressed because it is too large
Load Diff
596
data/keyword.txt
Normal file
596
data/keyword.txt
Normal file
@@ -0,0 +1,596 @@
|
|||||||
|
direct liquid soft head acrylic marker pen
|
||||||
|
guangna direct liquid soft head acrylic marker pen
|
||||||
|
guangna direct liquid soft head acrylic marker pen 24 colors
|
||||||
|
@huangaa_3 guangna direct liquid soft head acrylic marker pen
|
||||||
|
how to draw with water based markers
|
||||||
|
acrylic marker holder
|
||||||
|
acrylic pen markers
|
||||||
|
liquid acrylic pens
|
||||||
|
direct liquid pen
|
||||||
|
direct liquid acrylic marker pen japan
|
||||||
|
direct liquid acrylic marker pen japanese
|
||||||
|
direct liquid acrylic marker pen jumbo
|
||||||
|
direct liquid acrylic marker pen
|
||||||
|
acrylic marker pens
|
||||||
|
liquitex acrylic markers review
|
||||||
|
acrylic marker art
|
||||||
|
liquitex acrylic marker
|
||||||
|
direct liquid acrylic marker pen instructions
|
||||||
|
direct liquid acrylic marker pen ink
|
||||||
|
direct liquid acrylic marker pen ii
|
||||||
|
direct liquid acrylic marker pen instructions pdf
|
||||||
|
direct liquid acrylic marker pen in usa
|
||||||
|
direct liquid acrylic marker pen msds
|
||||||
|
direct liquid acrylic marker pen markers
|
||||||
|
direct liquid acrylic marker pen michaels
|
||||||
|
direct liquid acrylic marker pen msds sheet
|
||||||
|
direct liquid acrylic marker pen large
|
||||||
|
direct liquid acrylic marker pen lowes
|
||||||
|
direct liquid acrylic marker pen label
|
||||||
|
direct liquid acrylic marker pen liquid
|
||||||
|
direct liquid acrylic marker pen light
|
||||||
|
direct liquid acrylic marker pen liner
|
||||||
|
direct liquid acrylic marker pen liner review
|
||||||
|
what pen to use to write on acrylic
|
||||||
|
liquid marker pen
|
||||||
|
direct liquid acrylic marker pen kit
|
||||||
|
direct liquid acrylic marker pen kit instructions
|
||||||
|
direct liquid acrylic marker pen orange
|
||||||
|
direct liquid acrylic marker pen only
|
||||||
|
direct liquid acrylic marker pen on amazon
|
||||||
|
direct liquid acrylic marker pen oil
|
||||||
|
direct liquid acrylic marker pen on sale
|
||||||
|
direct liquid acrylic marker pen off clothes
|
||||||
|
direct liquid acrylic marker pen off golf balls
|
||||||
|
direct liquid acrylic marker pen on walls
|
||||||
|
direct liquid acrylic marker pen pack
|
||||||
|
direct liquid acrylic marker pen pink
|
||||||
|
direct liquid acrylic marker pen pen
|
||||||
|
direct liquid acrylic marker pen purple
|
||||||
|
direct liquid acrylic marker pen price
|
||||||
|
direct liquid acrylic marker pen paper mate
|
||||||
|
direct liquid acrylic marker pen quality
|
||||||
|
direct liquid acrylic marker pen quick dry
|
||||||
|
direct liquid acrylic marker pen qvc
|
||||||
|
direct liquid acrylic marker pen quizlet
|
||||||
|
direct liquid acrylic marker pen quiz
|
||||||
|
direct liquid acrylic marker pen refill
|
||||||
|
direct liquid acrylic marker pen review
|
||||||
|
direct liquid acrylic marker pen reddit
|
||||||
|
direct liquid acrylic marker pen red
|
||||||
|
direct liquid acrylic marker pen target
|
||||||
|
direct liquid acrylic marker pen tip
|
||||||
|
direct liquid acrylic marker pen type
|
||||||
|
direct liquid acrylic marker pen tint
|
||||||
|
direct liquid acrylic marker pen use
|
||||||
|
direct liquid acrylic marker pen usa
|
||||||
|
direct liquid acrylic marker pen us
|
||||||
|
direct liquid acrylic marker pen walmart
|
||||||
|
direct liquid acrylic marker pen white
|
||||||
|
direct liquid acrylic marker pen wholesale
|
||||||
|
direct liquid acrylic marker pen waterproof
|
||||||
|
direct liquid acrylic marker pen walgreens
|
||||||
|
direct liquid acrylic marker pen video
|
||||||
|
direct liquid acrylic marker pen vintage
|
||||||
|
direct liquid acrylic marker pen vs
|
||||||
|
direct liquid acrylic marker pen volume
|
||||||
|
direct liquid acrylic marker pen vs regular
|
||||||
|
direct liquid acrylic marker pen vape
|
||||||
|
direct liquid acrylic marker pen xl
|
||||||
|
direct liquid acrylic marker pen x2
|
||||||
|
acrylic marker diy
|
||||||
|
diy acrylic paint markers
|
||||||
|
direct liquid acrylic marker pen zoom
|
||||||
|
direct liquid acrylic marker pen zero
|
||||||
|
direct liquid acrylic marker pen zipper
|
||||||
|
direct liquid acrylic marker pen zip
|
||||||
|
direct liquid acrylic marker pen zoominfo
|
||||||
|
acrylic marker paper
|
||||||
|
acrylic marker painting
|
||||||
|
acrylic liquid pens
|
||||||
|
acrylic markers blick
|
||||||
|
what markers can you use on acrylic
|
||||||
|
are liquid chalk markers permanent
|
||||||
|
acrylic marker edding
|
||||||
|
direct liquid acrylic marker pen amazon
|
||||||
|
direct liquid acrylic marker pen amazon prime
|
||||||
|
direct liquid acrylic marker pen acrylic
|
||||||
|
direct liquid acrylic marker pen app
|
||||||
|
direct liquid acrylic marker pen art
|
||||||
|
acrylic marker fine tip
|
||||||
|
acrylic paint markers fine tip
|
||||||
|
are sharpie gel pens waterproof
|
||||||
|
liqui-mark gel pens
|
||||||
|
acrylic marker graffiti
|
||||||
|
acrylic paint markers how to use
|
||||||
|
acrylic ink marker
|
||||||
|
krink acrylic markers
|
||||||
|
permanent marker on acrylic plastic
|
||||||
|
liqui-mark permanent markers
|
||||||
|
liquitex acrylic paint markers
|
||||||
|
marker acrylic
|
||||||
|
acrylic paint marker waterproof
|
||||||
|
acrylic pen marker
|
||||||
|
direct liquid acrylic marker pen black
|
||||||
|
direct liquid acrylic marker pen bulk
|
||||||
|
direct liquid acrylic marker pen blue
|
||||||
|
direct liquid acrylic marker pen brand
|
||||||
|
direct liquid acrylic marker pen brand name
|
||||||
|
direct liquid acrylic marker pen bleeding through paper
|
||||||
|
direct liquid acrylic marker pen bleeding through paint
|
||||||
|
direct liquid acrylic marker pen brush photoshop
|
||||||
|
oil based marker on acrylic paint
|
||||||
|
acrylic markers on plastic
|
||||||
|
acrylic paint marker refill
|
||||||
|
are sharpies water based
|
||||||
|
are sharpies oil or water based
|
||||||
|
acrylic marker liquitex
|
||||||
|
liquitex acrylic pen
|
||||||
|
direct liquid acrylic marker pen directions
|
||||||
|
direct liquid acrylic marker pen dollar tree
|
||||||
|
direct liquid acrylic marker pen dispenser
|
||||||
|
direct liquid acrylic marker pen directions for use
|
||||||
|
direct liquid acrylic marker pen dry
|
||||||
|
direct liquid acrylic marker pen disguises
|
||||||
|
direct liquid acrylic marker pen depot
|
||||||
|
direct liquid acrylic marker pens fine line
|
||||||
|
direct liquid acrylic marker pens fine point
|
||||||
|
direct liquid acrylic marker pens fine tip
|
||||||
|
direct liquid acrylic marker pens for painting
|
||||||
|
direct liquid acrylic marker pen ebay
|
||||||
|
direct liquid acrylic marker pen ewg
|
||||||
|
direct liquid acrylic marker pen elite
|
||||||
|
direct liquid acrylic marker pen elite review
|
||||||
|
direct liquid acrylic marker pen elite vaporizer
|
||||||
|
uni acrylic markers
|
||||||
|
using acrylic markers
|
||||||
|
what marker to use on acrylic
|
||||||
|
what type of marker to use on acrylic
|
||||||
|
what marker works on plastic
|
||||||
|
where to buy acrylic markers
|
||||||
|
where to buy acrylic paint pens
|
||||||
|
are acrylic markers permanent
|
||||||
|
acrylic marker uses
|
||||||
|
acrylic marker tutorial
|
||||||
|
can you use permanent marker on acrylic paint
|
||||||
|
who acrylic markers
|
||||||
|
who sells liquitex acrylic paint
|
||||||
|
are acrylic markers water based
|
||||||
|
is acrylic liquid monomer
|
||||||
|
are acrylic pens oil based
|
||||||
|
are acrylic paint markers waterproof
|
||||||
|
can you use permanent marker on acrylic
|
||||||
|
can dry erase markers be used on acrylic
|
||||||
|
can acrylic markers be used on fabric
|
||||||
|
acrylic vs oil marker
|
||||||
|
acrylic markers vs acrylic paint
|
||||||
|
acrylic paint marker vs oil paint marker
|
||||||
|
are acrylic paint markers permanent
|
||||||
|
will dry erase markers work on plexiglass
|
||||||
|
will permanent marker stick to plastic
|
||||||
|
will permanent marker stay on silicone
|
||||||
|
does dry erase markers work on plexiglass
|
||||||
|
worst acrylic paint
|
||||||
|
worst acrylic paint brands
|
||||||
|
worst acrylic powder
|
||||||
|
art supplies acrylic marker testing comparison
|
||||||
|
acrylic paint markers permanent
|
||||||
|
do dry erase markers work on acrylic
|
||||||
|
do acrylic markers work on fabric
|
||||||
|
do acrylic paint pens work on plastic
|
||||||
|
do dry erase markers work on plexiglass
|
||||||
|
best acrylic markers
|
||||||
|
best acrylic markers for artists
|
||||||
|
best acrylic paint pens for plastic
|
||||||
|
top rated acrylic paint markers
|
||||||
|
top rated acrylic paint pens
|
||||||
|
can acrylic paint markers be used on canvas
|
||||||
|
acrylic marker pens
|
||||||
|
acrylic marker pen set
|
||||||
|
acrylic marker pen price
|
||||||
|
acrylic marker pen uses
|
||||||
|
acrylic marker pen drawing
|
||||||
|
acrylic marker pens uk
|
||||||
|
acrylic marker pen black
|
||||||
|
acrylic marker pen 24 shades
|
||||||
|
acrylic marker pen art
|
||||||
|
acrylic marker pen painting
|
||||||
|
acrylic marker pens hobbycraft
|
||||||
|
acrylic marker pen amazon
|
||||||
|
acrylic marker pen mr diy
|
||||||
|
acrylic marker pen for fabric
|
||||||
|
acrylic marker pen how to use
|
||||||
|
acrylic pen and marker holder
|
||||||
|
acrylic marker pen himic
|
||||||
|
acrylic marker pens home bargains
|
||||||
|
acrylic marker pens hobby lobby
|
||||||
|
acrylic paint pen hobby lobby
|
||||||
|
acrylic paint pen holder
|
||||||
|
acrylic paint pen hobbycraft
|
||||||
|
acrylic paint pen home depot
|
||||||
|
acrylic paint pen how to use
|
||||||
|
acrylic paint pens home bargains
|
||||||
|
acrylic paint pens hs code
|
||||||
|
best acrylic paint pens hobbycraft
|
||||||
|
acrylic marker pen gold
|
||||||
|
acrylic marker pens guangna
|
||||||
|
acrylic paint pen gold
|
||||||
|
acrylic paint pen green
|
||||||
|
acrylic paint pen grey
|
||||||
|
acrylic paint pen glass
|
||||||
|
acrylic paint pen grey set
|
||||||
|
acrylic paint pen golf ball
|
||||||
|
acrylic paint pen graffiti
|
||||||
|
acrylic paint pens grabie
|
||||||
|
acrylic paint pens glitter
|
||||||
|
acrylic paint pens guangna
|
||||||
|
acrylic paint pens gundam
|
||||||
|
acrylic paint pens gunpla
|
||||||
|
acrylic paint pen ideas
|
||||||
|
acrylic paint pen ideas for beginners
|
||||||
|
acrylic paint pen icon
|
||||||
|
acrylic paint pens ireland
|
||||||
|
acrylic paint pens in national bookstore
|
||||||
|
acrylic paint pens in store
|
||||||
|
acrylic paint pens india
|
||||||
|
acrylic paint in pen
|
||||||
|
acrylic paint pen art ideas
|
||||||
|
acrylic paint pen drawing ideas
|
||||||
|
easy acrylic paint pen ideas
|
||||||
|
which acrylic paint pen is best
|
||||||
|
acrylic paint pen craft ideas
|
||||||
|
acrylic paint pens ak interactive
|
||||||
|
acrylic marker pen flair
|
||||||
|
acrylic marker pen faber castell
|
||||||
|
acrylic marker pen fine tip
|
||||||
|
acrylic marker pen for kids
|
||||||
|
acrylic marker pen flipkart
|
||||||
|
acrylic marker pens for glass
|
||||||
|
acrylic paint pen fine tip
|
||||||
|
acrylic paint pen for fabric
|
||||||
|
acrylic paint pen for wood
|
||||||
|
acrylic paint pen flowers
|
||||||
|
acrylic paint pen for art and crafts
|
||||||
|
acrylic paint pen for canvas
|
||||||
|
acrylic paint pen for glass
|
||||||
|
acrylic marker pen shopee
|
||||||
|
acrylic paint pen set
|
||||||
|
acrylic paint pen storage
|
||||||
|
acrylic marker sketch pen
|
||||||
|
acrylic paint pen set uk
|
||||||
|
acrylic paint pen silver
|
||||||
|
acrylic paint pen sealer
|
||||||
|
acrylic paint pen spotlight
|
||||||
|
acrylic paint pen set nearby
|
||||||
|
acrylic paint sketch pen
|
||||||
|
acrylic paint marker pen set
|
||||||
|
acrylic marker 12 pen set flair brand
|
||||||
|
acrylic marker pen posca
|
||||||
|
acrylic marker pen popular
|
||||||
|
acrylic marker pen peak
|
||||||
|
acrylic paint pen projects
|
||||||
|
acrylic paint pen posca
|
||||||
|
acrylic paint pen price
|
||||||
|
acrylic paint pen painting
|
||||||
|
acrylic paint pen pink
|
||||||
|
acrylic paint pens pna
|
||||||
|
acrylic paint pens permanent
|
||||||
|
acrylic paint pens pastel
|
||||||
|
acrylic paint pens professional
|
||||||
|
acrylic paint pen joann
|
||||||
|
acrylic paint pens jumbo
|
||||||
|
acrylic pen ideas
|
||||||
|
what pens write on acrylic
|
||||||
|
do acrylic paint pens work on plastic
|
||||||
|
what are acrylic paint pens used for
|
||||||
|
acrylic marker pen ohuhu
|
||||||
|
acrylic paint pen on fabric
|
||||||
|
acrylic paint pen on glass
|
||||||
|
acrylic paint pen officeworks
|
||||||
|
acrylic paint pen on canvas
|
||||||
|
acrylic paint pen on wood
|
||||||
|
acrylic paint pen on plastic
|
||||||
|
acrylic paint pen on mirror
|
||||||
|
acrylic paint pen on metal
|
||||||
|
acrylic paint pen on leather
|
||||||
|
acrylic paint pen organizer
|
||||||
|
acrylic paint pen on skin
|
||||||
|
acrylic paint pen on shirt
|
||||||
|
acrylic paint pen on ceramic
|
||||||
|
acrylic marker pen
|
||||||
|
acrylic marker pen white
|
||||||
|
acrylic marker pen near me
|
||||||
|
acrylic paint pens video
|
||||||
|
acrylic paint pens vs markers
|
||||||
|
acrylic paint pen vs sharpie
|
||||||
|
sharpie acrylic paint pens vs posca
|
||||||
|
acrylic paint pens vs alcohol markers
|
||||||
|
acrylic pen vs marker
|
||||||
|
acrylic pen vs permanent marker
|
||||||
|
acrylic marker vs brush pen
|
||||||
|
acrylic marker vs color pen
|
||||||
|
acrylic marker vs gel pen
|
||||||
|
acrylic paint pens velles
|
||||||
|
acrylic marker pen under ₹ 100
|
||||||
|
acrylic marker pen under 200
|
||||||
|
acrylic marker pen under ₹ 200
|
||||||
|
acrylic marker pen under ₹ 300
|
||||||
|
acrylic marker pen under 100
|
||||||
|
acrylic marker pen under ₹ 400
|
||||||
|
acrylic paint pen uses
|
||||||
|
acrylic paint pen ultra fine
|
||||||
|
acrylic paint pen uk
|
||||||
|
acrylic paint pens ultra fine tip
|
||||||
|
acrylic paint pens uk amazon
|
||||||
|
best acrylic marker pens uk
|
||||||
|
acrylic marker pens kmart
|
||||||
|
acrylic paint pen kmart
|
||||||
|
acrylic paint pen kits
|
||||||
|
acrylic paint pens kids
|
||||||
|
acrylic paint pens kuwait
|
||||||
|
acrylic paint pens nz kmart
|
||||||
|
acrylic paint dot pens kmart
|
||||||
|
kokuyo camlin acrylic marker pen
|
||||||
|
what are acrylic markers used for
|
||||||
|
does permanent marker stay on acrylic
|
||||||
|
acrylic paint pens not working
|
||||||
|
acrylic pens how to use
|
||||||
|
acrylic marker pen meesho
|
||||||
|
acrylic marker pen malaysia
|
||||||
|
acrylic marker pens michaels
|
||||||
|
acrylic paint pen michaels
|
||||||
|
acrylic paint pen mont marte
|
||||||
|
acrylic paint pen metallic
|
||||||
|
acrylic paint pen molotow
|
||||||
|
acrylic paint pen mug
|
||||||
|
acrylic paint pens miniatures
|
||||||
|
acrylic paint pens mr price
|
||||||
|
acrylic paint pens medium tip
|
||||||
|
acrylic paint pens mitre 10
|
||||||
|
acrylic paint pens michaels nearby
|
||||||
|
acrylic marker pen quality
|
||||||
|
acrylic marker pen quiz
|
||||||
|
acrylic marker pen quick dry
|
||||||
|
acrylic marker pen qvc
|
||||||
|
acrylic marker pen quick release
|
||||||
|
artecho acrylic marker pen
|
||||||
|
artecho dual tip acrylic marker pen
|
||||||
|
arrtx acrylic marker pen
|
||||||
|
acrylic marker vs acrylic paint pen
|
||||||
|
acrylic paint marker pen amazon
|
||||||
|
difference between acrylic marker and brush pen
|
||||||
|
akarued white paint pen acrylic marker
|
||||||
|
acrylic marker brush pen amazon
|
||||||
|
marker acrylic pen allegro
|
||||||
|
acrylic marker and brush pen
|
||||||
|
is an acrylic marker a paint pen
|
||||||
|
best acrylic marker pen
|
||||||
|
black acrylic marker pen
|
||||||
|
brustro acrylic marker pen
|
||||||
|
best white acrylic marker pen
|
||||||
|
baoke acrylic marker pen
|
||||||
|
acrylic marker brush pen
|
||||||
|
brush pen vs acrylic marker
|
||||||
|
acrylic brush marker pen set
|
||||||
|
acrylic paint marker brush pen
|
||||||
|
acrylic paint marker pen black
|
||||||
|
acrylic marker pen box
|
||||||
|
acrylic paint marker calligraphy brush pen
|
||||||
|
best acrylic paint marker pen
|
||||||
|
acrylic marker pens the works
|
||||||
|
acrylic marker pens the range
|
||||||
|
acrylic marker pens tesco
|
||||||
|
acrylic paint pen tooli art
|
||||||
|
acrylic paint pen tutorial
|
||||||
|
acrylic paint pen target
|
||||||
|
acrylic paint pen techniques
|
||||||
|
acrylic paint pen tips
|
||||||
|
acrylic paint pen thin
|
||||||
|
acrylic paint pen thick
|
||||||
|
acrylic paint pen the range
|
||||||
|
acrylic paint pen tool art
|
||||||
|
acrylic paint pen temu
|
||||||
|
acrylic paint pens the works
|
||||||
|
acrylic paint pen white
|
||||||
|
acrylic paint pen walmart
|
||||||
|
acrylic paint pen water based
|
||||||
|
acrylic paint pen waterproof
|
||||||
|
acrylic paint pen warhammer
|
||||||
|
acrylic paint pen white michaels
|
||||||
|
acrylic marker with pen
|
||||||
|
acrylic paint pen washable
|
||||||
|
acrylic paint pen wood
|
||||||
|
acrylic paint pens warehouse
|
||||||
|
acrylic paint pens with brush tip
|
||||||
|
whsmith acrylic paint pens
|
||||||
|
acrylic paint pens wholesale
|
||||||
|
acrylic marker pen refill
|
||||||
|
acrylic marker pen review
|
||||||
|
acrylic paint pen refillable
|
||||||
|
acrylic paint pen reviews
|
||||||
|
acrylic paint pen removal
|
||||||
|
acrylic paint pen red
|
||||||
|
acrylic paint pen reddit
|
||||||
|
acrylic paint pens range
|
||||||
|
acrylic paint pens rocks
|
||||||
|
acrylic paint pens rymans
|
||||||
|
acrylic paint pens reject shop
|
||||||
|
acrylic paint pens red dot
|
||||||
|
acrylic paint pen for resin
|
||||||
|
acrylic marker pen xl
|
||||||
|
acrylic marker pen xray
|
||||||
|
acrylic marker pen xtool
|
||||||
|
guangna direct liquid soft head acrylic marker pen
|
||||||
|
gold acrylic marker pen
|
||||||
|
grasp acrylic marker pen
|
||||||
|
guangna acrylic marker pen
|
||||||
|
grabie acrylic marker pen
|
||||||
|
languo acrylic marker gel pen
|
||||||
|
acrylic marker brush pen guangna
|
||||||
|
marker acrylic pen m&g
|
||||||
|
goffi acrylic paint marker pen
|
||||||
|
doloha acrylic.marker pen
|
||||||
|
deli acrylic marker pen
|
||||||
|
doms acrylic marker pen
|
||||||
|
dual tip acrylic marker pen
|
||||||
|
direct liquid soft head acrylic marker pen
|
||||||
|
direct liquid acrylic marker pen
|
||||||
|
dual tip acrylic paint pen marker
|
||||||
|
double.sided acrylic pen marker
|
||||||
|
double sided acrylic pen marker set of 24
|
||||||
|
acrylic marker pen diy
|
||||||
|
dual tip acrylic paint pen marker - 24/48/72 colours
|
||||||
|
beyond draw-dual-tip-acrylic-paint-pen-marker
|
||||||
|
fine tip acrylic marker pen
|
||||||
|
flair acrylic marker pen
|
||||||
|
marker pen for acrylic painting
|
||||||
|
marker pen for acrylic board
|
||||||
|
acrylic marker pen used for
|
||||||
|
marker pen for acrylic
|
||||||
|
white marker pen for acrylic
|
||||||
|
flair acrylic paint marker pen
|
||||||
|
ohuhu acrylic marker pen for diy
|
||||||
|
camel acrylic marker pen
|
||||||
|
carissa acrylic marker pen
|
||||||
|
camlin acrylic marker pen
|
||||||
|
acrylic colour marker pen
|
||||||
|
acrylic marker pen china
|
||||||
|
acrylic marker pen in chinese
|
||||||
|
acrylic marker brush pen 80 cores
|
||||||
|
sharpie creative marker acrylic paint pen
|
||||||
|
acrylic marker brush pen 60 crore
|
||||||
|
caneta acrylic marker brush pen
|
||||||
|
miya acrylic marker pen
|
||||||
|
metallic acrylic marker pen
|
||||||
|
led acrylic writing message board night lamp with marker pen
|
||||||
|
acrylic marker brush pen mercado livre
|
||||||
|
b&m acrylic marker pens
|
||||||
|
sketching pens & markers acrylic marker pen
|
||||||
|
marcadores acrylic marker pen
|
||||||
|
set acrylic marker pen
|
||||||
|
soft head acrylic marker pen
|
||||||
|
silver acrylic marker pen
|
||||||
|
acrylic marker 12 pen set
|
||||||
|
acrylic marker pen 48 shades
|
||||||
|
acrylic marker pen 36 shades
|
||||||
|
acrylic marker pen 12 shades
|
||||||
|
pen peak acrylic marker pen
|
||||||
|
price acrylic marker pen
|
||||||
|
posca acrylic marker pen
|
||||||
|
paint acrylic marker pen
|
||||||
|
acrylic permanent marker pen
|
||||||
|
wotek acrylic paint marker pen
|
||||||
|
acrylic paint marker pen white
|
||||||
|
acrylic marker vs paint pen
|
||||||
|
thick acrylic marker pen
|
||||||
|
the acrylic paint marker pen set
|
||||||
|
acrylic tip marker pen
|
||||||
|
is an acrylic marker the same as a paint pen
|
||||||
|
how to use acrylic marker pen
|
||||||
|
enmy acrylic marker pen
|
||||||
|
emmy acrylic marker pen
|
||||||
|
marker acrylic pen empik
|
||||||
|
what are acrylic markers
|
||||||
|
are there acrylic paint pens
|
||||||
|
how to use acrylic markers
|
||||||
|
nicety acrylic marker pen
|
||||||
|
acrylic marker pen nearby
|
||||||
|
acrylic pen marker national bookstore
|
||||||
|
what markers can you use on acrylic
|
||||||
|
ohuhu acrylic marker pen
|
||||||
|
ohuhu acrylic marker pen price
|
||||||
|
unicorn acrylic marker pen
|
||||||
|
what is acrylic marker pen
|
||||||
|
what are acrylic paint markers used for
|
||||||
|
white acrylic marker pen
|
||||||
|
@huangaa_3 guangna direct liquid soft head acrylic marker pen
|
||||||
|
hightune acrylic marker brush pen
|
||||||
|
liquid soft head acrylic marker pen
|
||||||
|
how to use acrylic marker
|
||||||
|
how do acrylic markers work
|
||||||
|
languo acrylic marker pen
|
||||||
|
liquid acrylic marker pen
|
||||||
|
best acrylic marker pens
|
||||||
|
best acrylic paint marker pens
|
||||||
|
what are the best acrylic markers
|
||||||
|
what is the best acrylic paint pens
|
||||||
|
what are the best acrylic pens
|
||||||
|
acrylic marker pen ideas
|
||||||
|
acrylic marker pen price in bangladesh
|
||||||
|
is sharpie acrylic
|
||||||
|
acrylic paint pen zeyar
|
||||||
|
acrylic paint pens new zealand
|
||||||
|
acrylic marker brush pen zjw
|
||||||
|
acrylic marker pen zuixua
|
||||||
|
restly acrylic marker pen
|
||||||
|
how long do acrylic paint pens last
|
||||||
|
are acrylic paint pens permanent
|
||||||
|
are sharpies acrylic
|
||||||
|
acrylic paint pens lyuvie
|
||||||
|
acrylic paint pens like posca
|
||||||
|
acrylic paint pens liquitex
|
||||||
|
acrylic paint pens life of colour
|
||||||
|
acrylic paint pens lowes
|
||||||
|
acrylic paint pens large
|
||||||
|
acrylic paint pens large set
|
||||||
|
acrylic paint pens languo
|
||||||
|
acrylic paint pens for leather shoes
|
||||||
|
acrylic marker pen best
|
||||||
|
acrylic marker pen blinkit
|
||||||
|
acrylic marker pens b&m
|
||||||
|
acrylic paint pen black
|
||||||
|
acrylic paint pen brush tip
|
||||||
|
acrylic paint pen brands
|
||||||
|
acrylic paint pen brush
|
||||||
|
acrylic paint pen bunnings
|
||||||
|
acrylic paint pen brown
|
||||||
|
acrylic paint pen big w
|
||||||
|
acrylic paint pen blue
|
||||||
|
acrylic paint pen by numbers
|
||||||
|
acrylic marker pen doms
|
||||||
|
acrylic marker pen deli
|
||||||
|
acrylic marker pen dual tip
|
||||||
|
acrylic paint pen drawing
|
||||||
|
acrylic paint pen drying time
|
||||||
|
acrylic paint pen designs
|
||||||
|
acrylic paint pen doodles
|
||||||
|
acrylic paint pen dry
|
||||||
|
dual tip acrylic paint.pen
|
||||||
|
acrylic paint pens dried out
|
||||||
|
acrylic paint pens desire deluxe
|
||||||
|
acrylic marker pen camlin
|
||||||
|
acrylic marker pen camel
|
||||||
|
acrylic paint pen coloring book
|
||||||
|
acrylic paint pen crafts
|
||||||
|
acrylic paint pen case
|
||||||
|
acrylic paint pen canvas
|
||||||
|
acrylic marker colour pen
|
||||||
|
acrylic paint pen car
|
||||||
|
acrylic paint pen colouring book
|
||||||
|
acrylic paint pen canada
|
||||||
|
acrylic paint pens cheap
|
||||||
|
acrylic paint pens crockd
|
||||||
|
acrylic marker pen enmy
|
||||||
|
acrylic marker pens ebay
|
||||||
|
acrylic paint pen extra fine
|
||||||
|
acrylic paint pen extra fine tip
|
||||||
|
acrylic paint pen empty
|
||||||
|
acrylic paint pens ebay
|
||||||
|
acrylic paint pens earth tones
|
||||||
|
acrylic paint pens eckersley
|
||||||
|
acrylic paint pens enmy
|
||||||
|
bia acrylic paint pen extra fine tip
|
||||||
|
sharpie acrylic paint pens earth tones
|
||||||
|
artistro acrylic paint pens extra fine tip
|
||||||
|
acrylic paint pens double ended
|
||||||
|
acrylic marker pens arrtx
|
||||||
|
acrylic marker pens arrtx simptap
|
||||||
|
acrylic paint pen art
|
||||||
|
acrylic paint pen amazon
|
||||||
|
acrylic paint pen artwork
|
||||||
|
acrylic paint pens australia
|
||||||
|
acrylic paint pens argos
|
||||||
|
acrylic paint pens asda
|
||||||
|
acrylic paint pens artistro
|
||||||
|
acrylic paint pens arrtx
|
||||||
|
acrylic paint pens at michaels
|
||||||
18
data/links.json
Normal file
18
data/links.json
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
{
|
||||||
|
"keyword": "马克笔绘画",
|
||||||
|
"count": 12,
|
||||||
|
"links": [
|
||||||
|
"https://www.tiktok.com/@drawing_board8/video/7569235583214587150",
|
||||||
|
"https://www.tiktok.com/@seekingartsupplier_my/video/7259632291306032402",
|
||||||
|
"https://www.tiktok.com/@huangaa_3/video/7522044666745818382",
|
||||||
|
"https://www.tiktok.com/@acrylicmarkerasmr/video/7470837517571345695",
|
||||||
|
"https://www.tiktok.com/@fungraffiti/video/7550422681103895863",
|
||||||
|
"https://www.tiktok.com/@yzd20328cuq/video/7553666411760012574",
|
||||||
|
"https://www.tiktok.com/@acrylicmarkerasmr/video/7495580197811440926",
|
||||||
|
"https://www.tiktok.com/@miss.uk3/video/7561345033010416916",
|
||||||
|
"https://www.tiktok.com/@nashvibes/video/7472932371990416670",
|
||||||
|
"https://www.tiktok.com/@muse1378/video/7322534704643722539",
|
||||||
|
"https://www.tiktok.com/@miss.uk3/video/7567258402661960981",
|
||||||
|
"https://www.tiktok.com/@miss.uk3/video/7569627806678846740"
|
||||||
|
]
|
||||||
|
}
|
||||||
25
data/store.py
Normal file
25
data/store.py
Normal file
@@ -0,0 +1,25 @@
|
|||||||
|
from utils.io import write_json
|
||||||
|
|
||||||
|
def save_links_snapshot(path, keywords, items, links):
|
||||||
|
"""写入链接快照
|
||||||
|
|
||||||
|
结构:`{'keywords': list, 'items': list, 'total_count': int, 'links': list}`
|
||||||
|
"""
|
||||||
|
write_json(path, {'keywords': list(keywords), 'items': items, 'total_count': len(links), 'links': all_links(links)})
|
||||||
|
return path
|
||||||
|
|
||||||
|
def save_comments_snapshot(path, items):
|
||||||
|
"""写入评论快照:`{'items': items}`"""
|
||||||
|
write_json(path, {'items': items})
|
||||||
|
return path
|
||||||
|
|
||||||
|
def all_links(links):
|
||||||
|
"""将任意可迭代链接转为列表(用于 JSON 序列化)"""
|
||||||
|
return list(links)
|
||||||
|
"""数据快照写入工具
|
||||||
|
|
||||||
|
职责:
|
||||||
|
- 将链接搜索的结果按统一结构写入 JSON
|
||||||
|
- 将评论抓取的结果写入 JSON
|
||||||
|
仅负责序列化,不包含业务逻辑。
|
||||||
|
"""
|
||||||
8914177
data/tik_comments.json
Normal file
8914177
data/tik_comments.json
Normal file
File diff suppressed because one or more lines are too long
39884
data/urls.json
Normal file
39884
data/urls.json
Normal file
File diff suppressed because it is too large
Load Diff
BIN
db/__pycache__/mysql_import.cpython-312.pyc
Normal file
BIN
db/__pycache__/mysql_import.cpython-312.pyc
Normal file
Binary file not shown.
49
db/mysql_import.py
Normal file
49
db/mysql_import.py
Normal file
@@ -0,0 +1,49 @@
|
|||||||
|
import csv
|
||||||
|
import os
|
||||||
|
|
||||||
|
def import_csv_to_mysql(csv_path, host='localhost', port=3306, user='root', password='', database='crawler_tiktok', table='comments'):
|
||||||
|
try:
|
||||||
|
import pymysql
|
||||||
|
except Exception:
|
||||||
|
print('missing dependency: pip install pymysql', flush=True)
|
||||||
|
raise SystemExit(1)
|
||||||
|
if not os.path.exists(csv_path):
|
||||||
|
print('csv not found: ' + csv_path, flush=True)
|
||||||
|
raise SystemExit(1)
|
||||||
|
conn = pymysql.connect(host=host, port=int(port), user=user, password=password, database=database, charset='utf8mb4')
|
||||||
|
cur = conn.cursor()
|
||||||
|
cur.execute(f"CREATE TABLE IF NOT EXISTS `{table}` (\n `id` BIGINT AUTO_INCREMENT PRIMARY KEY,\n `username` VARCHAR(255),\n `text` TEXT\n ) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci")
|
||||||
|
rows = []
|
||||||
|
with open(csv_path, 'r', encoding='utf-8', newline='') as f:
|
||||||
|
r = csv.reader(f)
|
||||||
|
first = True
|
||||||
|
for row in r:
|
||||||
|
if first and row and row[0].lower() == 'username':
|
||||||
|
first = False
|
||||||
|
continue
|
||||||
|
first = False
|
||||||
|
if not row:
|
||||||
|
continue
|
||||||
|
username = row[0] if len(row) > 0 else ''
|
||||||
|
text = row[1] if len(row) > 1 else ''
|
||||||
|
rows.append((username, text))
|
||||||
|
if rows:
|
||||||
|
cur.executemany(f"INSERT INTO `{table}` (`username`,`text`) VALUES (%s,%s)", rows)
|
||||||
|
conn.commit()
|
||||||
|
cur.close()
|
||||||
|
conn.close()
|
||||||
|
print(f"inserted={len(rows)}", flush=True)
|
||||||
|
|
||||||
|
def create_database_if_not_exists(host='localhost', port=3306, user='root', password='', database='yunque'):
|
||||||
|
try:
|
||||||
|
import pymysql
|
||||||
|
except Exception:
|
||||||
|
print('missing dependency: pip install pymysql', flush=True)
|
||||||
|
raise SystemExit(1)
|
||||||
|
conn = pymysql.connect(host=host, port=int(port), user=user, password=password, charset='utf8mb4')
|
||||||
|
cur = conn.cursor()
|
||||||
|
cur.execute(f"CREATE DATABASE IF NOT EXISTS `{database}` CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci")
|
||||||
|
conn.commit()
|
||||||
|
cur.close()
|
||||||
|
conn.close()
|
||||||
|
print(f"database_ready={database}", flush=True)
|
||||||
185
main.py
Normal file
185
main.py
Normal file
@@ -0,0 +1,185 @@
|
|||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
from utils.io import load_keywords_from_file, read_json
|
||||||
|
from tiktok.search import save_links_multi
|
||||||
|
from tiktok.comments import save_comments_from_links
|
||||||
|
from db.mysql_import import import_csv_to_mysql, create_database_if_not_exists
|
||||||
|
|
||||||
|
def run_links(args):
|
||||||
|
"""运行链接收集阶段
|
||||||
|
|
||||||
|
参数来源:命令行(关键词、请求文件、分页、并发等)
|
||||||
|
流程:
|
||||||
|
1. 汇总关键词(--keyword/--keywords/--keywords-file)
|
||||||
|
2. 校验非空
|
||||||
|
3. 调用 `save_links_multi` 并发搜索与去重,保存到 `args.out`
|
||||||
|
"""
|
||||||
|
kws = []
|
||||||
|
if args.keyword:
|
||||||
|
kws.extend([k for k in args.keyword if k])
|
||||||
|
if args.keywords:
|
||||||
|
for k in args.keywords.split(','):
|
||||||
|
k = k.strip()
|
||||||
|
if k:
|
||||||
|
kws.append(k)
|
||||||
|
if args.keywords_file:
|
||||||
|
kws.extend(load_keywords_from_file(args.keywords_file))
|
||||||
|
kws = [k for k in kws if k]
|
||||||
|
if not kws:
|
||||||
|
raise SystemExit('no keywords')
|
||||||
|
save_links_multi(kws, out_path=args.out, file_path=args.file_path, max_pages=args.max_pages, timeout=args.timeout, count=args.count, workers=args.workers)
|
||||||
|
|
||||||
|
def run_comments(args):
|
||||||
|
"""运行评论与回复抓取阶段
|
||||||
|
|
||||||
|
输入:`args.links_json`(可为统一快照或简单结构)
|
||||||
|
读取逻辑:优先 `links` 字段;若无,则聚合 `items[*].links`
|
||||||
|
调用:`save_comments_from_links` 执行并发抓取,输出 JSON 与可选 CSV
|
||||||
|
"""
|
||||||
|
obj = read_json(args.links_json)
|
||||||
|
links = obj.get('links') or []
|
||||||
|
if not links and os.path.exists(args.links_json):
|
||||||
|
try:
|
||||||
|
for name in ['links', 'items']:
|
||||||
|
if name == 'items':
|
||||||
|
tmp = []
|
||||||
|
for it in obj.get('items', []):
|
||||||
|
tmp.extend(it.get('links', []))
|
||||||
|
links = tmp
|
||||||
|
break
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
if not links:
|
||||||
|
raise SystemExit('no links')
|
||||||
|
save_comments_from_links(links, out_path=args.out, file_path=args.file_path, count=args.count, pages=args.pages, timeout=args.timeout, reply_count=args.reply_count, reply_pages=args.reply_pages, total_limit=args.limit, reply_total_limit=args.reply_limit, csv_path=args.csv, workers=args.workers)
|
||||||
|
|
||||||
|
def run_all(args):
|
||||||
|
"""串联执行链接收集与评论抓取
|
||||||
|
|
||||||
|
1. 解析关键词并调用搜索阶段输出到 `args.links_out`
|
||||||
|
2. 读取链接快照,兼容两种结构
|
||||||
|
3. 调用评论抓取阶段输出到 `args.comments_out` 并可写入 CSV
|
||||||
|
适用于一体化流水线执行。
|
||||||
|
"""
|
||||||
|
kws = []
|
||||||
|
if getattr(args, 'keyword', None):
|
||||||
|
kws.extend([k for k in args.keyword if k])
|
||||||
|
if getattr(args, 'keywords', None):
|
||||||
|
for k in args.keywords.split(','):
|
||||||
|
k = k.strip()
|
||||||
|
if k:
|
||||||
|
kws.append(k)
|
||||||
|
if getattr(args, 'keywords_file', None):
|
||||||
|
kws.extend(load_keywords_from_file(args.keywords_file))
|
||||||
|
kws = [k for k in kws if k]
|
||||||
|
if not kws:
|
||||||
|
raise SystemExit('no keywords')
|
||||||
|
save_links_multi(kws, out_path=args.links_out, file_path=args.file_path, max_pages=args.search_max_pages, timeout=args.search_timeout, count=args.search_count, workers=args.search_workers)
|
||||||
|
obj = read_json(args.links_out)
|
||||||
|
links = obj.get('links') or []
|
||||||
|
if not links and os.path.exists(args.links_out):
|
||||||
|
try:
|
||||||
|
for name in ['links', 'items']:
|
||||||
|
if name == 'items':
|
||||||
|
tmp = []
|
||||||
|
for it in obj.get('items', []):
|
||||||
|
tmp.extend(it.get('links', []))
|
||||||
|
links = tmp
|
||||||
|
break
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
if not links:
|
||||||
|
raise SystemExit('no links')
|
||||||
|
save_comments_from_links(links, out_path=args.comments_out, file_path=args.file_path, count=args.comments_count, pages=args.comments_pages, timeout=args.comments_timeout, reply_count=args.reply_count, reply_pages=args.reply_pages, total_limit=args.comments_limit, reply_total_limit=args.reply_limit, csv_path=args.csv, workers=args.comments_workers)
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""命令行解析并分发到对应子命令函数"""
|
||||||
|
p = argparse.ArgumentParser()
|
||||||
|
sub = p.add_subparsers(dest='cmd')
|
||||||
|
p_links = sub.add_parser('links')
|
||||||
|
p_links.add_argument('--keyword', action='append')
|
||||||
|
p_links.add_argument('--keywords', default=None)
|
||||||
|
p_links.add_argument('--keywords-file', default=None)
|
||||||
|
p_links.add_argument('--file-path', default=r'data\1.text')
|
||||||
|
p_links.add_argument('--out', default='data\\urls.json')
|
||||||
|
p_links.add_argument('--max-pages', type=int, default=50)
|
||||||
|
p_links.add_argument('--count', type=int, default=None)
|
||||||
|
p_links.add_argument('--timeout', type=int, default=30)
|
||||||
|
p_links.add_argument('--workers', type=int, default=5)
|
||||||
|
p_links.set_defaults(func=run_links)
|
||||||
|
|
||||||
|
p_comments = sub.add_parser('comments')
|
||||||
|
p_comments.add_argument('--links-json', default='data\\urls.json')
|
||||||
|
p_comments.add_argument('--out', default='data\\tik_comments.json')
|
||||||
|
p_comments.add_argument('--file-path', default=r'data\\1.text')
|
||||||
|
p_comments.add_argument('--count', type=int, default=100)
|
||||||
|
p_comments.add_argument('--pages', type=int, default=100)
|
||||||
|
p_comments.add_argument('--timeout', type=int, default=30)
|
||||||
|
p_comments.add_argument('--limit', type=int, default=None)
|
||||||
|
p_comments.add_argument('--reply-count', type=int, default=100)
|
||||||
|
p_comments.add_argument('--reply-pages', type=int, default=100)
|
||||||
|
p_comments.add_argument('--reply-limit', type=int, default=None)
|
||||||
|
p_comments.add_argument('--csv', default='data\\comments.csv')
|
||||||
|
p_comments.add_argument('--workers', type=int, default=None)
|
||||||
|
p_comments.set_defaults(func=run_comments)
|
||||||
|
|
||||||
|
p_all = sub.add_parser('all')
|
||||||
|
p_all.add_argument('--keyword', action='append')
|
||||||
|
p_all.add_argument('--keywords', default=None)
|
||||||
|
p_all.add_argument('--keywords-file', default=None)
|
||||||
|
p_all.add_argument('--file-path', default=r'data\\1.text')
|
||||||
|
p_all.add_argument('--links-out', default='data\\urls.json')
|
||||||
|
p_all.add_argument('--search-max-pages', type=int, default=50)
|
||||||
|
p_all.add_argument('--search-count', type=int, default=None)
|
||||||
|
p_all.add_argument('--search-timeout', type=int, default=30)
|
||||||
|
p_all.add_argument('--search-workers', type=int, default=5)
|
||||||
|
p_all.add_argument('--comments-out', default='data\\tik_comments.json')
|
||||||
|
p_all.add_argument('--comments-count', type=int, default=100)
|
||||||
|
p_all.add_argument('--comments-pages', type=int, default=100)
|
||||||
|
p_all.add_argument('--comments-timeout', type=int, default=30)
|
||||||
|
p_all.add_argument('--comments-limit', type=int, default=None)
|
||||||
|
p_all.add_argument('--reply-count', type=int, default=100)
|
||||||
|
p_all.add_argument('--reply-pages', type=int, default=100)
|
||||||
|
p_all.add_argument('--reply-limit', type=int, default=None)
|
||||||
|
p_all.add_argument('--csv', default='data\\comments.csv')
|
||||||
|
p_all.add_argument('--comments-workers', type=int, default=None)
|
||||||
|
p_all.set_defaults(func=run_all)
|
||||||
|
|
||||||
|
p_mysql = sub.add_parser('mysql')
|
||||||
|
p_mysql.add_argument('--csv', default='data\\comments.csv')
|
||||||
|
p_mysql.add_argument('--host', default='localhost')
|
||||||
|
p_mysql.add_argument('--port', type=int, default=3306)
|
||||||
|
p_mysql.add_argument('--user', default='root')
|
||||||
|
p_mysql.add_argument('--password', default='')
|
||||||
|
p_mysql.add_argument('--database', default='crawler_tiktok')
|
||||||
|
p_mysql.add_argument('--table', default='comments')
|
||||||
|
def run_mysql(args):
|
||||||
|
import_csv_to_mysql(args.csv, host=args.host, port=args.port, user=args.user, password=args.password, database=args.database, table=args.table)
|
||||||
|
p_mysql.set_defaults(func=run_mysql)
|
||||||
|
|
||||||
|
p_mysql_db = sub.add_parser('mysql-db')
|
||||||
|
p_mysql_db.add_argument('--host', default='localhost')
|
||||||
|
p_mysql_db.add_argument('--port', type=int, default=3306)
|
||||||
|
p_mysql_db.add_argument('--user', default='root')
|
||||||
|
p_mysql_db.add_argument('--password', default='')
|
||||||
|
p_mysql_db.add_argument('--database', default='yunque')
|
||||||
|
def run_mysql_db(args):
|
||||||
|
create_database_if_not_exists(host=args.host, port=args.port, user=args.user, password=args.password, database=args.database)
|
||||||
|
p_mysql_db.set_defaults(func=run_mysql_db)
|
||||||
|
|
||||||
|
args = p.parse_args()
|
||||||
|
if not args.cmd:
|
||||||
|
p.print_help()
|
||||||
|
raise SystemExit(1)
|
||||||
|
args.func(args)
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
main()
|
||||||
|
"""命令行入口模块
|
||||||
|
|
||||||
|
提供三类子命令:
|
||||||
|
- links:根据关键词并发搜索视频链接并保存快照
|
||||||
|
- comments:根据链接列表抓取评论与回复并保存快照与 CSV
|
||||||
|
- all:串联 links 与 comments,一次性完成全流程
|
||||||
|
运行方式建议使用 `python -m crawler_tiktok.main ...` 以避免导入路径问题。
|
||||||
|
"""
|
||||||
BIN
tiktok/__pycache__/comments.cpython-312.pyc
Normal file
BIN
tiktok/__pycache__/comments.cpython-312.pyc
Normal file
Binary file not shown.
BIN
tiktok/__pycache__/search.cpython-312.pyc
Normal file
BIN
tiktok/__pycache__/search.cpython-312.pyc
Normal file
Binary file not shown.
208
tiktok/comments.py
Normal file
208
tiktok/comments.py
Normal file
@@ -0,0 +1,208 @@
|
|||||||
|
import json
|
||||||
|
import re
|
||||||
|
import threading
|
||||||
|
import time
|
||||||
|
from urllib.parse import urlparse, parse_qs, urlencode
|
||||||
|
from urllib.request import Request, urlopen
|
||||||
|
from core.curl import parse_curl_file
|
||||||
|
from utils.io import ensure_csv_header, append_csv_rows
|
||||||
|
from data.store import save_comments_snapshot
|
||||||
|
|
||||||
|
def _extract_aweme_id(link):
|
||||||
|
"""从视频链接中提取 aweme_id(/video/<id>)"""
|
||||||
|
m = re.search(r"/video/(\d+)", link)
|
||||||
|
return m.group(1) if m else None
|
||||||
|
|
||||||
|
def fetch_comments_aweme(aweme_id, file_path, count=20, max_pages=50, timeout=30, total_limit=None, referer=None):
|
||||||
|
"""分页抓取某个视频的评论
|
||||||
|
|
||||||
|
参数:
|
||||||
|
- `aweme_id` 视频 ID
|
||||||
|
- `file_path` curl 文本文件(第 1 块为评论接口基准)
|
||||||
|
- `count/max_pages/timeout` 分页与超时控制
|
||||||
|
- `total_limit` 总条数上限(可选)
|
||||||
|
- `referer` 用于设置请求头的来源页(可选)
|
||||||
|
行为:失败重试、必要时切换到兜底评论接口;处理 `has_more/next_cursor`。
|
||||||
|
返回:评论对象列表。
|
||||||
|
"""
|
||||||
|
reqs = parse_curl_file(file_path)
|
||||||
|
if not reqs:
|
||||||
|
return []
|
||||||
|
base = reqs[0]
|
||||||
|
headers = dict(base['headers'])
|
||||||
|
if referer:
|
||||||
|
headers['referer'] = referer
|
||||||
|
cursor = 0
|
||||||
|
all_comments = []
|
||||||
|
for _ in range(max_pages):
|
||||||
|
u_parsed = urlparse(base['url'])
|
||||||
|
q = parse_qs(u_parsed.query)
|
||||||
|
q['aweme_id'] = [str(aweme_id)]
|
||||||
|
q['count'] = [str(count)]
|
||||||
|
q['cursor'] = [str(cursor)]
|
||||||
|
u = u_parsed._replace(query=urlencode(q, doseq=True)).geturl()
|
||||||
|
data = None
|
||||||
|
for i in range(3):
|
||||||
|
try:
|
||||||
|
req = Request(u, headers=headers, method='GET')
|
||||||
|
with urlopen(req, timeout=timeout) as resp:
|
||||||
|
data = resp.read()
|
||||||
|
break
|
||||||
|
except Exception:
|
||||||
|
time.sleep(0.5 * (i + 1))
|
||||||
|
data = None
|
||||||
|
try:
|
||||||
|
obj = json.loads(data.decode('utf-8', errors='ignore'))
|
||||||
|
except Exception:
|
||||||
|
obj = {}
|
||||||
|
if not obj.get('comments'):
|
||||||
|
alt_params = {'aid': 1988, 'aweme_id': aweme_id, 'count': count, 'cursor': cursor}
|
||||||
|
alt_url = 'https://www.tiktok.com/api/comment/list/?' + urlencode(alt_params)
|
||||||
|
for i in range(2):
|
||||||
|
try:
|
||||||
|
req = Request(alt_url, headers=headers, method='GET')
|
||||||
|
with urlopen(req, timeout=timeout) as resp:
|
||||||
|
data2 = resp.read()
|
||||||
|
obj2 = json.loads(data2.decode('utf-8', errors='ignore'))
|
||||||
|
if obj2.get('comments'):
|
||||||
|
obj = obj2
|
||||||
|
break
|
||||||
|
except Exception:
|
||||||
|
time.sleep(0.5 * (i + 1))
|
||||||
|
comments = obj.get('comments') or []
|
||||||
|
for c in comments:
|
||||||
|
all_comments.append(c)
|
||||||
|
if isinstance(total_limit, int) and total_limit > 0 and len(all_comments) >= total_limit:
|
||||||
|
break
|
||||||
|
has_more = obj.get('has_more')
|
||||||
|
next_cursor = obj.get('cursor') or obj.get('next_cursor')
|
||||||
|
if has_more in (True, 1) and isinstance(next_cursor, int):
|
||||||
|
cursor = next_cursor
|
||||||
|
continue
|
||||||
|
if comments and isinstance(next_cursor, int):
|
||||||
|
cursor = next_cursor
|
||||||
|
continue
|
||||||
|
break
|
||||||
|
return all_comments
|
||||||
|
|
||||||
|
def fetch_replies(comment_id, aweme_id, file_path, count=20, max_pages=50, timeout=30, total_limit=None):
|
||||||
|
"""分页抓取某条评论的二级回复
|
||||||
|
|
||||||
|
参数:`comment_id/aweme_id` 标识;其他参数同评论抓取。
|
||||||
|
返回:回复对象列表。
|
||||||
|
"""
|
||||||
|
reqs = parse_curl_file(file_path)
|
||||||
|
if not reqs:
|
||||||
|
return []
|
||||||
|
headers = reqs[0]['headers']
|
||||||
|
base = 'https://www.tiktok.com/api/comment/list/reply/'
|
||||||
|
cursor = 0
|
||||||
|
replies = []
|
||||||
|
for _ in range(max_pages):
|
||||||
|
params = {'aid': 1988, 'aweme_id': aweme_id, 'comment_id': comment_id, 'count': count, 'cursor': cursor}
|
||||||
|
url = base + '?' + urlencode(params)
|
||||||
|
data = None
|
||||||
|
for i in range(3):
|
||||||
|
try:
|
||||||
|
req = Request(url, headers=headers, method='GET')
|
||||||
|
with urlopen(req, timeout=timeout) as resp:
|
||||||
|
data = resp.read()
|
||||||
|
break
|
||||||
|
except Exception:
|
||||||
|
time.sleep(0.5 * (i + 1))
|
||||||
|
data = None
|
||||||
|
try:
|
||||||
|
obj = json.loads(data.decode('utf-8', errors='ignore'))
|
||||||
|
except Exception:
|
||||||
|
obj = {}
|
||||||
|
arr = obj.get('comments') or []
|
||||||
|
for r in arr:
|
||||||
|
replies.append(r)
|
||||||
|
if isinstance(total_limit, int) and total_limit > 0 and len(replies) >= total_limit:
|
||||||
|
break
|
||||||
|
has_more = obj.get('has_more')
|
||||||
|
next_cursor = obj.get('cursor')
|
||||||
|
if has_more in (True, 1) and isinstance(next_cursor, int):
|
||||||
|
cursor = next_cursor
|
||||||
|
continue
|
||||||
|
break
|
||||||
|
return replies
|
||||||
|
|
||||||
|
_csv_lock = threading.Lock()
|
||||||
|
_print_lock = threading.Lock()
|
||||||
|
_results_lock = threading.Lock()
|
||||||
|
|
||||||
|
def save_comments_from_links(links, out_path, file_path, count=20, pages=50, timeout=30, reply_count=20, reply_pages=50, total_limit=None, reply_total_limit=None, csv_path=None, workers=None):
|
||||||
|
"""并发从视频链接抓取评论与回复并保存快照
|
||||||
|
|
||||||
|
并发:可选信号量限制;每个链接独立线程抓取;
|
||||||
|
CSV:若提供 `csv_path`,按 `username,text` 追加主评论与回复;
|
||||||
|
输出:写入 `out_path`,结构为 `{'items': [{link,count,comments: [...]}, ...]}`。
|
||||||
|
"""
|
||||||
|
ensure_csv_header(csv_path, ['username', 'text'])
|
||||||
|
results = []
|
||||||
|
sem = None
|
||||||
|
if isinstance(workers, int) and workers > 0:
|
||||||
|
sem = threading.Semaphore(workers)
|
||||||
|
|
||||||
|
def _process(link):
|
||||||
|
if sem:
|
||||||
|
sem.acquire()
|
||||||
|
with _print_lock:
|
||||||
|
print(f"[START] {link}", flush=True)
|
||||||
|
try:
|
||||||
|
cs = fetch_comments_aweme(_extract_aweme_id(link), file_path=file_path, count=count, max_pages=pages, timeout=timeout, total_limit=total_limit, referer=link)
|
||||||
|
enriched = []
|
||||||
|
for c in cs:
|
||||||
|
cid = c.get('cid')
|
||||||
|
if cid:
|
||||||
|
rs = fetch_replies(cid, _extract_aweme_id(link), file_path=file_path, count=reply_count, max_pages=reply_pages, timeout=timeout, total_limit=reply_total_limit)
|
||||||
|
c = dict(c)
|
||||||
|
c['replies'] = rs
|
||||||
|
c['reply_count'] = len(rs)
|
||||||
|
enriched.append(c)
|
||||||
|
try:
|
||||||
|
with _print_lock:
|
||||||
|
print(f"{link} | cid={c.get('cid')} | create_time={c.get('create_time')} | reply_count={c.get('reply_count', 0)} | text={c.get('text')}", flush=True)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
if csv_path:
|
||||||
|
u = c.get('user') or {}
|
||||||
|
uname = u.get('unique_id') or u.get('nickname') or u.get('uid') or ''
|
||||||
|
rows = [[uname, c.get('text')]]
|
||||||
|
for r in c.get('replies', []) or []:
|
||||||
|
ru = r.get('user') or {}
|
||||||
|
runame = ru.get('unique_id') or ru.get('nickname') or ru.get('uid') or ''
|
||||||
|
rows.append([runame, r.get('text')])
|
||||||
|
with _csv_lock:
|
||||||
|
append_csv_rows(csv_path, rows)
|
||||||
|
with _results_lock:
|
||||||
|
results.append({'link': link, 'count': len(cs), 'comments': enriched})
|
||||||
|
reply_total = sum(len(c.get('replies') or []) for c in enriched)
|
||||||
|
with _print_lock:
|
||||||
|
print(f"[DONE] {link} comments={len(cs)} replies={reply_total}", flush=True)
|
||||||
|
except Exception as e:
|
||||||
|
with _print_lock:
|
||||||
|
print(f"[ERROR] {link} {e}", flush=True)
|
||||||
|
finally:
|
||||||
|
if sem:
|
||||||
|
sem.release()
|
||||||
|
|
||||||
|
threads = []
|
||||||
|
for link in links:
|
||||||
|
t = threading.Thread(target=_process, args=(link,))
|
||||||
|
t.daemon = True
|
||||||
|
t.start()
|
||||||
|
threads.append(t)
|
||||||
|
for t in threads:
|
||||||
|
t.join()
|
||||||
|
save_comments_snapshot(out_path, results)
|
||||||
|
return out_path
|
||||||
|
"""TikTok 评论与回复抓取模块
|
||||||
|
|
||||||
|
能力:
|
||||||
|
- 根据视频链接提取 aweme_id
|
||||||
|
- 通过评论接口分页拉取评论(支持兜底接口)
|
||||||
|
- 针对每条评论抓取二级回复并汇总
|
||||||
|
- 可选写入 CSV 与打印进度日志
|
||||||
|
"""
|
||||||
171
tiktok/search.py
Normal file
171
tiktok/search.py
Normal file
@@ -0,0 +1,171 @@
|
|||||||
|
import json
|
||||||
|
import re
|
||||||
|
import threading
|
||||||
|
import time
|
||||||
|
from urllib.parse import urlparse, parse_qs, urlencode, urlunparse
|
||||||
|
from urllib.request import Request, urlopen
|
||||||
|
from core.curl import parse_curl_file
|
||||||
|
from data.store import save_links_snapshot
|
||||||
|
|
||||||
|
def _update_query(url, updates):
|
||||||
|
"""在原始 URL 上用 `updates` 更新查询参数并返回新 URL"""
|
||||||
|
p = urlparse(url)
|
||||||
|
q = parse_qs(p.query)
|
||||||
|
for k, v in updates.items():
|
||||||
|
q[k] = [str(v)]
|
||||||
|
new_q = urlencode(q, doseq=True)
|
||||||
|
return urlunparse((p.scheme, p.netloc, p.path, p.params, new_q, p.fragment))
|
||||||
|
|
||||||
|
def _extract_links(obj):
|
||||||
|
"""从返回对象中提取视频链接
|
||||||
|
|
||||||
|
优先从 `data -> item -> author.uniqueId + item.id` 组合;
|
||||||
|
同时遍历字符串字段,用正则匹配 tiktok 链接作为兜底。
|
||||||
|
返回:链接列表(可能包含重复,外层负责去重)。
|
||||||
|
"""
|
||||||
|
links = []
|
||||||
|
data = obj.get('data') if isinstance(obj, dict) else None
|
||||||
|
if isinstance(data, list):
|
||||||
|
for e in data:
|
||||||
|
if isinstance(e, dict) and e.get('type') == 1 and isinstance(e.get('item'), dict):
|
||||||
|
it = e['item']
|
||||||
|
author = it.get('author') or {}
|
||||||
|
uid = author.get('uniqueId')
|
||||||
|
vid = it.get('id')
|
||||||
|
if uid and vid:
|
||||||
|
links.append(f"https://www.tiktok.com/@{uid}/video/{vid}")
|
||||||
|
patterns = [
|
||||||
|
r"https?://www\.tiktok\.com/[\w@._-]+/video/\d+",
|
||||||
|
r"https?://www\.tiktok\.com/video/\d+",
|
||||||
|
r"https?://vm\.tiktok\.com/[\w-]+",
|
||||||
|
r"https?://vt\.tiktok\.com/[\w-]+",
|
||||||
|
]
|
||||||
|
def rec(x):
|
||||||
|
if isinstance(x, dict):
|
||||||
|
for v in x.values():
|
||||||
|
rec(v)
|
||||||
|
elif isinstance(x, list):
|
||||||
|
for v in x:
|
||||||
|
rec(v)
|
||||||
|
elif isinstance(x, str):
|
||||||
|
s = x
|
||||||
|
for pat in patterns:
|
||||||
|
for m in re.finditer(pat, s):
|
||||||
|
links.append(m.group(0))
|
||||||
|
rec(obj)
|
||||||
|
return links
|
||||||
|
|
||||||
|
def search_video_links(keyword, file_path, max_pages=50, timeout=30, count=None, on_link=None):
|
||||||
|
"""按关键词分页搜索视频链接
|
||||||
|
|
||||||
|
输入:从 `file_path` 的第 2 个 curl 请求获取基准 URL 与头部
|
||||||
|
行为:分页拉取、重试、解析链接;对新链接触发 `on_link` 回调
|
||||||
|
返回:所有发现的链接列表(不去重本地返回,外层统一去重)。
|
||||||
|
"""
|
||||||
|
reqs = parse_curl_file(file_path)
|
||||||
|
if len(reqs) < 2:
|
||||||
|
return []
|
||||||
|
base = reqs[1]
|
||||||
|
headers = base['headers']
|
||||||
|
parsed = urlparse(base['url'])
|
||||||
|
q = parse_qs(parsed.query)
|
||||||
|
if count is None:
|
||||||
|
if 'count' in q:
|
||||||
|
try:
|
||||||
|
count = int(q['count'][0])
|
||||||
|
except Exception:
|
||||||
|
count = 12
|
||||||
|
else:
|
||||||
|
count = 12
|
||||||
|
all_links = []
|
||||||
|
seen = set()
|
||||||
|
offset = 0
|
||||||
|
cursor = None
|
||||||
|
for _ in range(max_pages):
|
||||||
|
params = {'keyword': keyword, 'count': count}
|
||||||
|
if cursor is not None:
|
||||||
|
params['offset'] = cursor
|
||||||
|
else:
|
||||||
|
params['offset'] = offset
|
||||||
|
u = _update_query(base['url'], params)
|
||||||
|
data = None
|
||||||
|
for i in range(3):
|
||||||
|
try:
|
||||||
|
req = Request(u, headers=headers, method='GET')
|
||||||
|
with urlopen(req, timeout=timeout) as resp:
|
||||||
|
data = resp.read()
|
||||||
|
break
|
||||||
|
except Exception:
|
||||||
|
time.sleep(0.5 * (i + 1))
|
||||||
|
data = None
|
||||||
|
try:
|
||||||
|
obj = json.loads(data.decode('utf-8', errors='ignore'))
|
||||||
|
except Exception:
|
||||||
|
obj = {}
|
||||||
|
links = _extract_links(obj)
|
||||||
|
has_more = obj.get('has_more')
|
||||||
|
next_cursor = obj.get('cursor')
|
||||||
|
new = 0
|
||||||
|
for l in links:
|
||||||
|
if l not in seen:
|
||||||
|
seen.add(l)
|
||||||
|
all_links.append(l)
|
||||||
|
new += 1
|
||||||
|
if on_link:
|
||||||
|
try:
|
||||||
|
on_link(l)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
if has_more in (True, 1) and isinstance(next_cursor, int):
|
||||||
|
cursor = next_cursor
|
||||||
|
continue
|
||||||
|
if new == 0:
|
||||||
|
break
|
||||||
|
offset += count
|
||||||
|
return all_links
|
||||||
|
|
||||||
|
_print_lock = threading.Lock()
|
||||||
|
|
||||||
|
def save_links_multi(keywords, out_path, file_path, max_pages=50, timeout=30, count=None, workers=5):
|
||||||
|
"""并发按多个关键词搜索并保存快照
|
||||||
|
|
||||||
|
并发:使用线程 + 信号量限制并发;跨关键词统一去重;
|
||||||
|
输出:写入 `out_path`,包含 `keywords/items/total_count/links`。
|
||||||
|
"""
|
||||||
|
all_links = []
|
||||||
|
seen = set()
|
||||||
|
items = []
|
||||||
|
seen_lock = threading.Lock()
|
||||||
|
sem = threading.Semaphore(max(1, int(workers)))
|
||||||
|
|
||||||
|
def worker(kw):
|
||||||
|
with sem:
|
||||||
|
item_links = []
|
||||||
|
def on_new(l):
|
||||||
|
with seen_lock:
|
||||||
|
if l not in seen:
|
||||||
|
seen.add(l)
|
||||||
|
all_links.append(l)
|
||||||
|
item_links.append(l)
|
||||||
|
with _print_lock:
|
||||||
|
print(l, flush=True)
|
||||||
|
search_video_links(kw, file_path=file_path, max_pages=max_pages, timeout=timeout, count=count, on_link=on_new)
|
||||||
|
items.append({'keyword': kw, 'count': len(item_links), 'links': item_links})
|
||||||
|
|
||||||
|
threads = []
|
||||||
|
for kw in keywords:
|
||||||
|
t = threading.Thread(target=worker, args=(kw,))
|
||||||
|
t.daemon = True
|
||||||
|
t.start()
|
||||||
|
threads.append(t)
|
||||||
|
for t in threads:
|
||||||
|
t.join()
|
||||||
|
save_links_snapshot(out_path, keywords, items, all_links)
|
||||||
|
return out_path
|
||||||
|
"""TikTok 视频链接搜索模块
|
||||||
|
|
||||||
|
核心能力:
|
||||||
|
- 构造查询 URL(更新 keyword/offset/count 等参数)
|
||||||
|
- 发起请求并解析返回中的视频链接(结构化 + 正则兜底)
|
||||||
|
- 对多个关键词并发搜索、统一去重与快照保存
|
||||||
|
"""
|
||||||
BIN
utils/__pycache__/io.cpython-312.pyc
Normal file
BIN
utils/__pycache__/io.cpython-312.pyc
Normal file
Binary file not shown.
55
utils/filter_comments.py
Normal file
55
utils/filter_comments.py
Normal file
@@ -0,0 +1,55 @@
|
|||||||
|
import argparse
|
||||||
|
import csv
|
||||||
|
import os
|
||||||
|
|
||||||
|
def filter_comments(csv_in, csv_out, keywords):
|
||||||
|
ks = set(k.lower() for k in keywords if k)
|
||||||
|
rows_out = []
|
||||||
|
with open(csv_in, 'r', encoding='utf-8', newline='') as f:
|
||||||
|
r = csv.reader(f)
|
||||||
|
first = True
|
||||||
|
for row in r:
|
||||||
|
if first and row and row[0].lower() == 'username':
|
||||||
|
first = False
|
||||||
|
continue
|
||||||
|
first = False
|
||||||
|
if not row:
|
||||||
|
continue
|
||||||
|
text = row[1] if len(row) > 1 else ''
|
||||||
|
s = (text or '').lower()
|
||||||
|
if any(k in s for k in ks):
|
||||||
|
rows_out.append(row)
|
||||||
|
os.makedirs(os.path.dirname(csv_out), exist_ok=True)
|
||||||
|
with open(csv_out, 'w', encoding='utf-8', newline='') as wf:
|
||||||
|
w = csv.writer(wf)
|
||||||
|
w.writerow(['username', 'text'])
|
||||||
|
for r in rows_out:
|
||||||
|
w.writerow(r)
|
||||||
|
print(f"input={csv_in} keywords={len(ks)} matched_rows={len(rows_out)} out={csv_out}")
|
||||||
|
|
||||||
|
def main():
|
||||||
|
p = argparse.ArgumentParser()
|
||||||
|
p.add_argument('--extern-keywords', default=r'd:\work\test\test\all_keywords.txt')
|
||||||
|
p.add_argument('--local-keywords', default=r'data\keyword.txt')
|
||||||
|
p.add_argument('--csv-in', default=r'data\comments.csv')
|
||||||
|
p.add_argument('--csv-out', default=r'data\key_comment.csv')
|
||||||
|
args = p.parse_args()
|
||||||
|
def _load(path):
|
||||||
|
arr = []
|
||||||
|
try:
|
||||||
|
with open(path, 'r', encoding='utf-8') as f:
|
||||||
|
for line in f:
|
||||||
|
s = line.strip()
|
||||||
|
if s:
|
||||||
|
arr.append(s)
|
||||||
|
except Exception:
|
||||||
|
arr = []
|
||||||
|
return arr
|
||||||
|
kws = []
|
||||||
|
kws.extend(_load(args.extern_keywords))
|
||||||
|
kws.extend(_load(args.local_keywords))
|
||||||
|
kws.append('pen')
|
||||||
|
filter_comments(args.csv_in, args.csv_out, kws)
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
main()
|
||||||
54
utils/io.py
Normal file
54
utils/io.py
Normal file
@@ -0,0 +1,54 @@
|
|||||||
|
import json
|
||||||
|
import os
|
||||||
|
import csv
|
||||||
|
|
||||||
|
def load_keywords_from_file(path):
|
||||||
|
"""逐行读取关键词文件,忽略空行,返回列表"""
|
||||||
|
arr = []
|
||||||
|
try:
|
||||||
|
with open(path, 'r', encoding='utf-8') as f:
|
||||||
|
for line in f:
|
||||||
|
s = line.strip()
|
||||||
|
if s:
|
||||||
|
arr.append(s)
|
||||||
|
except Exception:
|
||||||
|
arr = []
|
||||||
|
return arr
|
||||||
|
|
||||||
|
def write_json(path, obj):
|
||||||
|
"""以 UTF-8 写入 JSON,使用非 ASCII 保留与缩进"""
|
||||||
|
with open(path, 'w', encoding='utf-8') as f:
|
||||||
|
json.dump(obj, f, ensure_ascii=False, indent=2)
|
||||||
|
|
||||||
|
def read_json(path):
|
||||||
|
"""读取 JSON 文件,失败时返回空对象"""
|
||||||
|
try:
|
||||||
|
with open(path, 'r', encoding='utf-8') as f:
|
||||||
|
return json.load(f)
|
||||||
|
except Exception:
|
||||||
|
return {}
|
||||||
|
|
||||||
|
def ensure_csv_header(path, headers):
|
||||||
|
"""若 CSV 不存在则创建并写入表头;为空路径直接返回"""
|
||||||
|
if not path:
|
||||||
|
return
|
||||||
|
if not os.path.exists(path):
|
||||||
|
with open(path, 'w', newline='', encoding='utf-8') as wf:
|
||||||
|
w = csv.writer(wf)
|
||||||
|
w.writerow(headers)
|
||||||
|
|
||||||
|
def append_csv_rows(path, rows):
|
||||||
|
"""向 CSV 追加多行,行元素按列表给出;为空路径直接返回"""
|
||||||
|
if not path:
|
||||||
|
return
|
||||||
|
with open(path, 'a', newline='', encoding='utf-8') as af:
|
||||||
|
w = csv.writer(af)
|
||||||
|
for r in rows:
|
||||||
|
w.writerow(r)
|
||||||
|
"""通用 IO 工具
|
||||||
|
|
||||||
|
提供:
|
||||||
|
- 关键词文件加载
|
||||||
|
- JSON 读写
|
||||||
|
- CSV 文件写入(确保表头、追加行)
|
||||||
|
"""
|
||||||
Reference in New Issue
Block a user