Files
ai_stock/DATA_SCHEMA.md
2025-12-08 15:30:19 +08:00

287 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 量化交易数据表设计(建议版)
本文档给出一个兼顾本地研究SQLite与生产化部署PostgreSQL的数据模型。目标可重复、可扩展、便于回测和线上实时监控支持行情、基本面、新闻情绪、信号、订单、成交、持仓与风控度量。
## 设计原则
- 时间统一:全部时间戳使用 UTC列命名统一为 `*_utc``as_of_date`UTC 日期)。
- 主键稳定:优先采用业务自然键的组合主键(如 `(symbol_id, ts_utc)`),跨系统对齐使用 surrogate key自增或 UUID
- 数值约定:
- 价格/金额使用 `NUMERIC(18,6)`PG`REAL`/`DECIMAL`SQLite
- 涨跌幅/比率存“小数形式”:如 4.02% 存 `0.0402`
- 幂等写入:所有采集/计算表支持 Upsert通过唯一键 + `ON CONFLICT DO UPDATE`PG`INSERT OR REPLACE`SQLite
- 分区与保留高频表按时间分区PG并制定数据保留策略如 Tick 保留 30 天1 分钟 Bar 保留 180 天,日线长期保留)。
## 表清单与字段
### 1) `symbols`(标的主表)
- 用途:统一管理股票/ETF 元数据。
- 主键:`id`PK自增/UUID唯一约束`(symbol, exchange)`
- 字段:
- `id` PK
- `symbol` 股票代码(如 `AAPL`
- `name` 名称
- `exchange` 交易所(如 `NASDAQ`
- `currency` 货币(如 `USD`
- `tick_size``lot_size`
- `sector``industry`
- `is_active` 布尔
- `first_seen_utc``last_seen_utc`
### 2) `calendars`(交易日历)
- 主键:`(exchange, date)`
- 字段:`is_trading_day``open_time_utc``close_time_utc``notes`
### 3) `bars_1m`1 分钟 K 线)
- 主键:`(symbol_id, ts_utc)``ts_utc` 为该分钟起始时刻)。
- 索引:`(ts_utc)``(symbol_id, ts_utc DESC)`
- 字段:`open``high``low``close``volume``vwap``trades_count``source`
### 4) `bars_1d`(日线 K 线)
- 主键:`(symbol_id, as_of_date)`
- 字段:`open``high``low``close``adj_close``volume``dividend``split_ratio``source`
### 5) `ticks`(逐笔/Level-1 快照,可选)
- 主键:`id`(自增/UUID唯一键建议`(symbol_id, ts_utc, source, seq)`
- 字段:`price``size``bid``ask``bid_size``ask_size``condition``seq``source`
### 6) `corporate_actions`(公司行为)
- 主键:`(symbol_id, ex_date, type)`
- 字段:`type``split`/`dividend`/...)、`amount``ratio``currency``notes`
### 7) `fundamentals_snapshot`(基本面快照)
- 主键:`(symbol_id, as_of_date)`
- 字段示例:`market_cap``pe_ttm``ps_ttm``pb``eps_ttm``revenue_ttm``shares_outstanding``updated_at_utc`
### 8) `news` 与 `news_symbols`(新闻与关联表)
- `news` 主键:`id`UUID/自增);字段:`published_at_utc``source``title``url``summary``sentiment_score``topics`
- `news_symbols` 主键:`(news_id, symbol_id)`
### 9) `signals`(策略信号)
- 主键:`id`UUID/自增);唯一建议:`(symbol_id, generated_at_utc, model_name, version)`
- 字段:
- `symbol_id``generated_at_utc`
- `signal_type`(如 `momentum`/`reversal`
- `direction``BUY`/`SELL`/`HOLD`
- `score`01 或 z-score
- `horizon`(如 `1d`/`1h`
- `params_json`(策略参数 JSON
- `model_name``version`
- `expires_at_utc`(过期时间,可空)
### 10) `orders`(订单)
- 主键:`id`;唯一建议:`broker_order_id`(如接入实盘)。
- 字段:`signal_id``symbol_id``side``order_type``qty``price``time_in_force``status``created_at_utc``updated_at_utc``broker_order_id`
### 11) `executions`(成交/回执)
- 主键:`id`;索引:`(order_id)``(exec_time_utc)`
- 字段:`order_id``exec_time_utc``price``qty``fee``liquidity``maker`/`taker`)。
### 12) `positions`(持仓快照)
- 主键:`(portfolio_id, symbol_id)` 或附带 `as_of_date` 做日终表。
- 字段:`qty``avg_cost``unrealized_pnl``realized_pnl``last_updated_utc`
### 13) `portfolios` / `portfolio_nav_daily`(组合与净值)
- `portfolios``id` PK、`name``base_currency``created_at_utc`
- `portfolio_nav_daily` 主键:`(portfolio_id, as_of_date)`;字段:`cash``equity_value``nav``daily_return``gross_exposure``net_exposure`
### 14) `risk_metrics_daily`(风险指标)
- 主键:`(portfolio_id, as_of_date)`;字段:`var_95``beta``sharpe``max_drawdown``volatility_20d` 等。
### 15) `etl_runs`(任务运行元数据)
- 主键:`run_id`;字段:`task_name``started_at_utc``finished_at_utc``status``rows_affected``checksum`
## PostgreSQL 示例 DDL核心表
```sql
-- 1) 标的
CREATE TABLE symbols (
id BIGSERIAL PRIMARY KEY,
symbol TEXT NOT NULL,
name TEXT,
exchange TEXT NOT NULL,
currency TEXT DEFAULT 'USD',
tick_size NUMERIC(18,6),
lot_size NUMERIC(18,6),
sector TEXT,
industry TEXT,
is_active BOOLEAN DEFAULT TRUE,
first_seen_utc TIMESTAMPTZ,
last_seen_utc TIMESTAMPTZ,
UNIQUE(symbol, exchange)
);
-- 2) 1 分钟 K 线
CREATE TABLE bars_1m (
symbol_id BIGINT NOT NULL REFERENCES symbols(id),
ts_utc TIMESTAMPTZ NOT NULL,
open NUMERIC(18,6) NOT NULL,
high NUMERIC(18,6) NOT NULL,
low NUMERIC(18,6) NOT NULL,
close NUMERIC(18,6) NOT NULL,
volume BIGINT,
vwap NUMERIC(18,6),
trades_count INTEGER,
source TEXT,
PRIMARY KEY(symbol_id, ts_utc)
);
CREATE INDEX ON bars_1m (ts_utc);
CREATE INDEX ON bars_1m (symbol_id, ts_utc DESC);
-- 3) 日线 K 线
CREATE TABLE bars_1d (
symbol_id BIGINT NOT NULL REFERENCES symbols(id),
as_of_date DATE NOT NULL,
open NUMERIC(18,6) NOT NULL,
high NUMERIC(18,6) NOT NULL,
low NUMERIC(18,6) NOT NULL,
close NUMERIC(18,6) NOT NULL,
adj_close NUMERIC(18,6),
volume BIGINT,
dividend NUMERIC(18,6),
split_ratio NUMERIC(18,6),
source TEXT,
PRIMARY KEY(symbol_id, as_of_date)
);
-- 4) 信号表(涨跌幅等指标请用小数存储)
CREATE TABLE signals (
id BIGSERIAL PRIMARY KEY,
symbol_id BIGINT NOT NULL REFERENCES symbols(id),
generated_at_utc TIMESTAMPTZ NOT NULL,
signal_type TEXT NOT NULL,
direction TEXT NOT NULL CHECK (direction IN ('BUY','SELL','HOLD')),
score NUMERIC(18,6),
horizon TEXT,
params_json JSONB,
model_name TEXT,
version TEXT,
expires_at_utc TIMESTAMPTZ,
UNIQUE(symbol_id, generated_at_utc, model_name, version)
);
CREATE INDEX ON signals (symbol_id, generated_at_utc DESC);
-- 5) 订单/成交
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
signal_id BIGINT REFERENCES signals(id),
symbol_id BIGINT NOT NULL REFERENCES symbols(id),
side TEXT NOT NULL CHECK (side IN ('BUY','SELL')),
order_type TEXT NOT NULL CHECK (order_type IN ('MKT','LMT')),
qty NUMERIC(18,6) NOT NULL,
price NUMERIC(18,6),
time_in_force TEXT,
status TEXT NOT NULL,
broker_order_id TEXT,
created_at_utc TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at_utc TIMESTAMPTZ
);
CREATE UNIQUE INDEX IF NOT EXISTS orders_broker_unique ON orders(broker_order_id) WHERE broker_order_id IS NOT NULL;
CREATE TABLE executions (
id BIGSERIAL PRIMARY KEY,
order_id BIGINT NOT NULL REFERENCES orders(id),
exec_time_utc TIMESTAMPTZ NOT NULL,
price NUMERIC(18,6) NOT NULL,
qty NUMERIC(18,6) NOT NULL,
fee NUMERIC(18,6),
liquidity TEXT
);
CREATE INDEX ON executions(order_id);
CREATE INDEX ON executions(exec_time_utc);
```
## SQLite 示例 DDL简化版
```sql
CREATE TABLE symbols (
id INTEGER PRIMARY KEY AUTOINCREMENT,
symbol TEXT NOT NULL,
name TEXT,
exchange TEXT NOT NULL,
currency TEXT,
tick_size REAL,
lot_size REAL,
sector TEXT,
industry TEXT,
is_active INTEGER DEFAULT 1,
first_seen_utc TEXT,
last_seen_utc TEXT,
UNIQUE(symbol, exchange)
);
CREATE TABLE bars_1m (
symbol_id INTEGER NOT NULL,
ts_utc TEXT NOT NULL,
open REAL NOT NULL,
high REAL NOT NULL,
low REAL NOT NULL,
close REAL NOT NULL,
volume INTEGER,
vwap REAL,
trades_count INTEGER,
source TEXT,
PRIMARY KEY(symbol_id, ts_utc)
);
CREATE TABLE signals (
id INTEGER PRIMARY KEY AUTOINCREMENT,
symbol_id INTEGER NOT NULL,
generated_at_utc TEXT NOT NULL,
signal_type TEXT NOT NULL,
direction TEXT NOT NULL,
score REAL,
horizon TEXT,
params_json TEXT,
model_name TEXT,
version TEXT,
expires_at_utc TEXT,
UNIQUE(symbol_id, generated_at_utc, model_name, version)
);
```
## 典型索引与查询
- 最新信号:
```sql
SELECT DISTINCT ON (s.symbol_id)
s.*
FROM signals s
ORDER BY s.symbol_id, s.generated_at_utc DESC;
```
- 连接 1 分钟线与信号(取信号后最近 30 分钟):
```sql
SELECT b.*
FROM signals s
JOIN bars_1m b
ON b.symbol_id = s.symbol_id
AND b.ts_utc BETWEEN s.generated_at_utc AND s.generated_at_utc + INTERVAL '30 minutes'
WHERE s.generated_at_utc >= NOW() - INTERVAL '1 day';
```
- 计算日收益(简单取 `close`
```sql
SELECT symbol_id,
as_of_date,
close / LAG(close) OVER (PARTITION BY symbol_id ORDER BY as_of_date) - 1 AS daily_return
FROM bars_1d;
```
## 数据保留与分区建议PG
- `ticks`:分区粒度=日,保留 3090 天。
- `bars_1m`:分区粒度=月,保留 180365 天。
- `bars_1d``signals``orders``executions`:长期保留。
## 与当前项目的对接要点
- 数值规范:`change_ratio` 等“比率”统一用小数存储(代码已修正),写入时无需 `%`
- 表落地策略:
- 初期SQLite 单文件,开发/回测方便;
- 扩展PostgreSQL + 分区 + 指标物化视图(如聚合分钟线)。
- ETL 幂等:抓取任务用 `(symbol_id, ts_utc)``(symbol_id, as_of_date)` 做 Upsert避免重复数据导致回测偏差。
## 后续扩展
- `features_*`:因子特征宽表(按频率区分:分钟/日)。
- `models_registry`:模型注册与版本追踪。
- `backtest_runs`:回测任务与指标结果表(收益、回撤、卡方检验等)。
---
如需,我可以按上述 DDL 直接生成 SQLite 数据库,并将 `monitor.py` 在每轮拉取后把 Top N 的 1 分钟 bar、信号落到库中Upsert用于后续回测与可视化分析。