287 lines
10 KiB
Markdown
287 lines
10 KiB
Markdown
|
|
# 量化交易数据表设计(建议版)
|
|||
|
|
|
|||
|
|
本文档给出一个兼顾本地研究(SQLite)与生产化部署(PostgreSQL)的数据模型。目标:可重复、可扩展、便于回测和线上实时监控,支持行情、基本面、新闻情绪、信号、订单、成交、持仓与风控度量。
|
|||
|
|
|
|||
|
|
## 设计原则
|
|||
|
|
- 时间统一:全部时间戳使用 UTC;列命名统一为 `*_utc` 或 `as_of_date`(UTC 日期)。
|
|||
|
|
- 主键稳定:优先采用业务自然键的组合主键(如 `(symbol_id, ts_utc)`),跨系统对齐使用 surrogate key(自增或 UUID)。
|
|||
|
|
- 数值约定:
|
|||
|
|
- 价格/金额使用 `NUMERIC(18,6)`(PG)或 `REAL`/`DECIMAL`(SQLite);
|
|||
|
|
- 涨跌幅/比率存“小数形式”:如 4.02% 存 `0.0402`。
|
|||
|
|
- 幂等写入:所有采集/计算表支持 Upsert;通过唯一键 + `ON CONFLICT DO UPDATE`(PG)或 `INSERT OR REPLACE`(SQLite)。
|
|||
|
|
- 分区与保留:高频表按时间分区(PG),并制定数据保留策略(如 Tick 保留 30 天,1 分钟 Bar 保留 180 天,日线长期保留)。
|
|||
|
|
|
|||
|
|
## 表清单与字段
|
|||
|
|
|
|||
|
|
### 1) `symbols`(标的主表)
|
|||
|
|
- 用途:统一管理股票/ETF 元数据。
|
|||
|
|
- 主键:`id`(PK,自增/UUID);唯一约束:`(symbol, exchange)`。
|
|||
|
|
- 字段:
|
|||
|
|
- `id` PK
|
|||
|
|
- `symbol` 股票代码(如 `AAPL`)
|
|||
|
|
- `name` 名称
|
|||
|
|
- `exchange` 交易所(如 `NASDAQ`)
|
|||
|
|
- `currency` 货币(如 `USD`)
|
|||
|
|
- `tick_size`、`lot_size`
|
|||
|
|
- `sector`、`industry`
|
|||
|
|
- `is_active` 布尔
|
|||
|
|
- `first_seen_utc`、`last_seen_utc`
|
|||
|
|
|
|||
|
|
### 2) `calendars`(交易日历)
|
|||
|
|
- 主键:`(exchange, date)`。
|
|||
|
|
- 字段:`is_trading_day`、`open_time_utc`、`close_time_utc`、`notes`。
|
|||
|
|
|
|||
|
|
### 3) `bars_1m`(1 分钟 K 线)
|
|||
|
|
- 主键:`(symbol_id, ts_utc)`(`ts_utc` 为该分钟起始时刻)。
|
|||
|
|
- 索引:`(ts_utc)`、`(symbol_id, ts_utc DESC)`。
|
|||
|
|
- 字段:`open`、`high`、`low`、`close`、`volume`、`vwap`、`trades_count`、`source`。
|
|||
|
|
|
|||
|
|
### 4) `bars_1d`(日线 K 线)
|
|||
|
|
- 主键:`(symbol_id, as_of_date)`。
|
|||
|
|
- 字段:`open`、`high`、`low`、`close`、`adj_close`、`volume`、`dividend`、`split_ratio`、`source`。
|
|||
|
|
|
|||
|
|
### 5) `ticks`(逐笔/Level-1 快照,可选)
|
|||
|
|
- 主键:`id`(自增/UUID);唯一键建议:`(symbol_id, ts_utc, source, seq)`。
|
|||
|
|
- 字段:`price`、`size`、`bid`、`ask`、`bid_size`、`ask_size`、`condition`、`seq`、`source`。
|
|||
|
|
|
|||
|
|
### 6) `corporate_actions`(公司行为)
|
|||
|
|
- 主键:`(symbol_id, ex_date, type)`。
|
|||
|
|
- 字段:`type`(`split`/`dividend`/...)、`amount`、`ratio`、`currency`、`notes`。
|
|||
|
|
|
|||
|
|
### 7) `fundamentals_snapshot`(基本面快照)
|
|||
|
|
- 主键:`(symbol_id, as_of_date)`。
|
|||
|
|
- 字段示例:`market_cap`、`pe_ttm`、`ps_ttm`、`pb`、`eps_ttm`、`revenue_ttm`、`shares_outstanding`、`updated_at_utc`。
|
|||
|
|
|
|||
|
|
### 8) `news` 与 `news_symbols`(新闻与关联表)
|
|||
|
|
- `news` 主键:`id`(UUID/自增);字段:`published_at_utc`、`source`、`title`、`url`、`summary`、`sentiment_score`、`topics`。
|
|||
|
|
- `news_symbols` 主键:`(news_id, symbol_id)`。
|
|||
|
|
|
|||
|
|
### 9) `signals`(策略信号)
|
|||
|
|
- 主键:`id`(UUID/自增);唯一建议:`(symbol_id, generated_at_utc, model_name, version)`。
|
|||
|
|
- 字段:
|
|||
|
|
- `symbol_id`、`generated_at_utc`
|
|||
|
|
- `signal_type`(如 `momentum`/`reversal`)
|
|||
|
|
- `direction`(`BUY`/`SELL`/`HOLD`)
|
|||
|
|
- `score`(0–1 或 z-score)
|
|||
|
|
- `horizon`(如 `1d`/`1h`)
|
|||
|
|
- `params_json`(策略参数 JSON)
|
|||
|
|
- `model_name`、`version`
|
|||
|
|
- `expires_at_utc`(过期时间,可空)
|
|||
|
|
|
|||
|
|
### 10) `orders`(订单)
|
|||
|
|
- 主键:`id`;唯一建议:`broker_order_id`(如接入实盘)。
|
|||
|
|
- 字段:`signal_id`、`symbol_id`、`side`、`order_type`、`qty`、`price`、`time_in_force`、`status`、`created_at_utc`、`updated_at_utc`、`broker_order_id`。
|
|||
|
|
|
|||
|
|
### 11) `executions`(成交/回执)
|
|||
|
|
- 主键:`id`;索引:`(order_id)`、`(exec_time_utc)`。
|
|||
|
|
- 字段:`order_id`、`exec_time_utc`、`price`、`qty`、`fee`、`liquidity`(`maker`/`taker`)。
|
|||
|
|
|
|||
|
|
### 12) `positions`(持仓快照)
|
|||
|
|
- 主键:`(portfolio_id, symbol_id)` 或附带 `as_of_date` 做日终表。
|
|||
|
|
- 字段:`qty`、`avg_cost`、`unrealized_pnl`、`realized_pnl`、`last_updated_utc`。
|
|||
|
|
|
|||
|
|
### 13) `portfolios` / `portfolio_nav_daily`(组合与净值)
|
|||
|
|
- `portfolios`:`id` PK、`name`、`base_currency`、`created_at_utc`。
|
|||
|
|
- `portfolio_nav_daily` 主键:`(portfolio_id, as_of_date)`;字段:`cash`、`equity_value`、`nav`、`daily_return`、`gross_exposure`、`net_exposure`。
|
|||
|
|
|
|||
|
|
### 14) `risk_metrics_daily`(风险指标)
|
|||
|
|
- 主键:`(portfolio_id, as_of_date)`;字段:`var_95`、`beta`、`sharpe`、`max_drawdown`、`volatility_20d` 等。
|
|||
|
|
|
|||
|
|
### 15) `etl_runs`(任务运行元数据)
|
|||
|
|
- 主键:`run_id`;字段:`task_name`、`started_at_utc`、`finished_at_utc`、`status`、`rows_affected`、`checksum`。
|
|||
|
|
|
|||
|
|
## PostgreSQL 示例 DDL(核心表)
|
|||
|
|
|
|||
|
|
```sql
|
|||
|
|
-- 1) 标的
|
|||
|
|
CREATE TABLE symbols (
|
|||
|
|
id BIGSERIAL PRIMARY KEY,
|
|||
|
|
symbol TEXT NOT NULL,
|
|||
|
|
name TEXT,
|
|||
|
|
exchange TEXT NOT NULL,
|
|||
|
|
currency TEXT DEFAULT 'USD',
|
|||
|
|
tick_size NUMERIC(18,6),
|
|||
|
|
lot_size NUMERIC(18,6),
|
|||
|
|
sector TEXT,
|
|||
|
|
industry TEXT,
|
|||
|
|
is_active BOOLEAN DEFAULT TRUE,
|
|||
|
|
first_seen_utc TIMESTAMPTZ,
|
|||
|
|
last_seen_utc TIMESTAMPTZ,
|
|||
|
|
UNIQUE(symbol, exchange)
|
|||
|
|
);
|
|||
|
|
|
|||
|
|
-- 2) 1 分钟 K 线
|
|||
|
|
CREATE TABLE bars_1m (
|
|||
|
|
symbol_id BIGINT NOT NULL REFERENCES symbols(id),
|
|||
|
|
ts_utc TIMESTAMPTZ NOT NULL,
|
|||
|
|
open NUMERIC(18,6) NOT NULL,
|
|||
|
|
high NUMERIC(18,6) NOT NULL,
|
|||
|
|
low NUMERIC(18,6) NOT NULL,
|
|||
|
|
close NUMERIC(18,6) NOT NULL,
|
|||
|
|
volume BIGINT,
|
|||
|
|
vwap NUMERIC(18,6),
|
|||
|
|
trades_count INTEGER,
|
|||
|
|
source TEXT,
|
|||
|
|
PRIMARY KEY(symbol_id, ts_utc)
|
|||
|
|
);
|
|||
|
|
CREATE INDEX ON bars_1m (ts_utc);
|
|||
|
|
CREATE INDEX ON bars_1m (symbol_id, ts_utc DESC);
|
|||
|
|
|
|||
|
|
-- 3) 日线 K 线
|
|||
|
|
CREATE TABLE bars_1d (
|
|||
|
|
symbol_id BIGINT NOT NULL REFERENCES symbols(id),
|
|||
|
|
as_of_date DATE NOT NULL,
|
|||
|
|
open NUMERIC(18,6) NOT NULL,
|
|||
|
|
high NUMERIC(18,6) NOT NULL,
|
|||
|
|
low NUMERIC(18,6) NOT NULL,
|
|||
|
|
close NUMERIC(18,6) NOT NULL,
|
|||
|
|
adj_close NUMERIC(18,6),
|
|||
|
|
volume BIGINT,
|
|||
|
|
dividend NUMERIC(18,6),
|
|||
|
|
split_ratio NUMERIC(18,6),
|
|||
|
|
source TEXT,
|
|||
|
|
PRIMARY KEY(symbol_id, as_of_date)
|
|||
|
|
);
|
|||
|
|
|
|||
|
|
-- 4) 信号表(涨跌幅等指标请用小数存储)
|
|||
|
|
CREATE TABLE signals (
|
|||
|
|
id BIGSERIAL PRIMARY KEY,
|
|||
|
|
symbol_id BIGINT NOT NULL REFERENCES symbols(id),
|
|||
|
|
generated_at_utc TIMESTAMPTZ NOT NULL,
|
|||
|
|
signal_type TEXT NOT NULL,
|
|||
|
|
direction TEXT NOT NULL CHECK (direction IN ('BUY','SELL','HOLD')),
|
|||
|
|
score NUMERIC(18,6),
|
|||
|
|
horizon TEXT,
|
|||
|
|
params_json JSONB,
|
|||
|
|
model_name TEXT,
|
|||
|
|
version TEXT,
|
|||
|
|
expires_at_utc TIMESTAMPTZ,
|
|||
|
|
UNIQUE(symbol_id, generated_at_utc, model_name, version)
|
|||
|
|
);
|
|||
|
|
CREATE INDEX ON signals (symbol_id, generated_at_utc DESC);
|
|||
|
|
|
|||
|
|
-- 5) 订单/成交
|
|||
|
|
CREATE TABLE orders (
|
|||
|
|
id BIGSERIAL PRIMARY KEY,
|
|||
|
|
signal_id BIGINT REFERENCES signals(id),
|
|||
|
|
symbol_id BIGINT NOT NULL REFERENCES symbols(id),
|
|||
|
|
side TEXT NOT NULL CHECK (side IN ('BUY','SELL')),
|
|||
|
|
order_type TEXT NOT NULL CHECK (order_type IN ('MKT','LMT')),
|
|||
|
|
qty NUMERIC(18,6) NOT NULL,
|
|||
|
|
price NUMERIC(18,6),
|
|||
|
|
time_in_force TEXT,
|
|||
|
|
status TEXT NOT NULL,
|
|||
|
|
broker_order_id TEXT,
|
|||
|
|
created_at_utc TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
|||
|
|
updated_at_utc TIMESTAMPTZ
|
|||
|
|
);
|
|||
|
|
CREATE UNIQUE INDEX IF NOT EXISTS orders_broker_unique ON orders(broker_order_id) WHERE broker_order_id IS NOT NULL;
|
|||
|
|
|
|||
|
|
CREATE TABLE executions (
|
|||
|
|
id BIGSERIAL PRIMARY KEY,
|
|||
|
|
order_id BIGINT NOT NULL REFERENCES orders(id),
|
|||
|
|
exec_time_utc TIMESTAMPTZ NOT NULL,
|
|||
|
|
price NUMERIC(18,6) NOT NULL,
|
|||
|
|
qty NUMERIC(18,6) NOT NULL,
|
|||
|
|
fee NUMERIC(18,6),
|
|||
|
|
liquidity TEXT
|
|||
|
|
);
|
|||
|
|
CREATE INDEX ON executions(order_id);
|
|||
|
|
CREATE INDEX ON executions(exec_time_utc);
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## SQLite 示例 DDL(简化版)
|
|||
|
|
|
|||
|
|
```sql
|
|||
|
|
CREATE TABLE symbols (
|
|||
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|||
|
|
symbol TEXT NOT NULL,
|
|||
|
|
name TEXT,
|
|||
|
|
exchange TEXT NOT NULL,
|
|||
|
|
currency TEXT,
|
|||
|
|
tick_size REAL,
|
|||
|
|
lot_size REAL,
|
|||
|
|
sector TEXT,
|
|||
|
|
industry TEXT,
|
|||
|
|
is_active INTEGER DEFAULT 1,
|
|||
|
|
first_seen_utc TEXT,
|
|||
|
|
last_seen_utc TEXT,
|
|||
|
|
UNIQUE(symbol, exchange)
|
|||
|
|
);
|
|||
|
|
|
|||
|
|
CREATE TABLE bars_1m (
|
|||
|
|
symbol_id INTEGER NOT NULL,
|
|||
|
|
ts_utc TEXT NOT NULL,
|
|||
|
|
open REAL NOT NULL,
|
|||
|
|
high REAL NOT NULL,
|
|||
|
|
low REAL NOT NULL,
|
|||
|
|
close REAL NOT NULL,
|
|||
|
|
volume INTEGER,
|
|||
|
|
vwap REAL,
|
|||
|
|
trades_count INTEGER,
|
|||
|
|
source TEXT,
|
|||
|
|
PRIMARY KEY(symbol_id, ts_utc)
|
|||
|
|
);
|
|||
|
|
|
|||
|
|
CREATE TABLE signals (
|
|||
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|||
|
|
symbol_id INTEGER NOT NULL,
|
|||
|
|
generated_at_utc TEXT NOT NULL,
|
|||
|
|
signal_type TEXT NOT NULL,
|
|||
|
|
direction TEXT NOT NULL,
|
|||
|
|
score REAL,
|
|||
|
|
horizon TEXT,
|
|||
|
|
params_json TEXT,
|
|||
|
|
model_name TEXT,
|
|||
|
|
version TEXT,
|
|||
|
|
expires_at_utc TEXT,
|
|||
|
|
UNIQUE(symbol_id, generated_at_utc, model_name, version)
|
|||
|
|
);
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 典型索引与查询
|
|||
|
|
- 最新信号:
|
|||
|
|
```sql
|
|||
|
|
SELECT DISTINCT ON (s.symbol_id)
|
|||
|
|
s.*
|
|||
|
|
FROM signals s
|
|||
|
|
ORDER BY s.symbol_id, s.generated_at_utc DESC;
|
|||
|
|
```
|
|||
|
|
- 连接 1 分钟线与信号(取信号后最近 30 分钟):
|
|||
|
|
```sql
|
|||
|
|
SELECT b.*
|
|||
|
|
FROM signals s
|
|||
|
|
JOIN bars_1m b
|
|||
|
|
ON b.symbol_id = s.symbol_id
|
|||
|
|
AND b.ts_utc BETWEEN s.generated_at_utc AND s.generated_at_utc + INTERVAL '30 minutes'
|
|||
|
|
WHERE s.generated_at_utc >= NOW() - INTERVAL '1 day';
|
|||
|
|
```
|
|||
|
|
- 计算日收益(简单取 `close`):
|
|||
|
|
```sql
|
|||
|
|
SELECT symbol_id,
|
|||
|
|
as_of_date,
|
|||
|
|
close / LAG(close) OVER (PARTITION BY symbol_id ORDER BY as_of_date) - 1 AS daily_return
|
|||
|
|
FROM bars_1d;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 数据保留与分区建议(PG)
|
|||
|
|
- `ticks`:分区粒度=日,保留 30–90 天。
|
|||
|
|
- `bars_1m`:分区粒度=月,保留 180–365 天。
|
|||
|
|
- `bars_1d`、`signals`、`orders`、`executions`:长期保留。
|
|||
|
|
|
|||
|
|
## 与当前项目的对接要点
|
|||
|
|
- 数值规范:`change_ratio` 等“比率”统一用小数存储(代码已修正),写入时无需 `%`。
|
|||
|
|
- 表落地策略:
|
|||
|
|
- 初期:SQLite 单文件,开发/回测方便;
|
|||
|
|
- 扩展:PostgreSQL + 分区 + 指标物化视图(如聚合分钟线)。
|
|||
|
|
- ETL 幂等:抓取任务用 `(symbol_id, ts_utc)` 或 `(symbol_id, as_of_date)` 做 Upsert;避免重复数据导致回测偏差。
|
|||
|
|
|
|||
|
|
## 后续扩展
|
|||
|
|
- `features_*`:因子特征宽表(按频率区分:分钟/日)。
|
|||
|
|
- `models_registry`:模型注册与版本追踪。
|
|||
|
|
- `backtest_runs`:回测任务与指标结果表(收益、回撤、卡方检验等)。
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
如需,我可以按上述 DDL 直接生成 SQLite 数据库,并将 `monitor.py` 在每轮拉取后把 Top N 的 1 分钟 bar、信号落到库中(Upsert),用于后续回测与可视化分析。
|