timeseries-table-format (Python)
timeseries-table-format helps you manage local, append-only time-series tables on disk.
You append new data files over time and query across all of them with SQL, getting results back as
a Python table object. It also prevents you from accidentally loading the same time window twice.
New here? Start with the Key concepts table below, then jump into the Quickstart.
A quick example of what this solves
Imagine you collect hourly price bars (open/high/low/close/volume) for a set of stock symbols — one Parquet file arrives each day. You want to query across 90 days of history with SQL, without building your own directory-scanning logic and without accidentally re-ingesting a day you already loaded. That's exactly the problem this library handles: it tracks what time windows are already covered (per symbol) and rejects duplicates automatically.
When to use this
- You ingest time-series data incrementally (new files arrive over time).
- You want SQL across segments/tables without building your own dataset plumbing.
- You want guardrails against accidentally re-ingesting the same time window (overlap detection).
When not to use this (v0)
- You need S3/object storage backends.
- You need compaction, schema evolution, or upserts/merges.
- You want a centralized database/server.
Key concepts (quick reference)
| Term | What it means |
|---|---|
| table root | A local directory that holds a time-series table: metadata, segments, and coverage data. |
| segment | A single Parquet file appended to a table. A table is made up of one or more segments. |
| bucket | The time granularity used for overlap detection (e.g. "1h", "1d"). Does not resample data. |
| entity | The logical identity of a time series (e.g. a stock symbol). Defined by entity_columns. |
| overlap detection | The guard that prevents you from appending data for the same entity + bucket twice. |
| Session | A DataFusion SQL session. Register tables into it and run SQL queries returning pyarrow.Table. |
| DataFusion | The SQL engine used internally. You don't need to install it separately. |
| Parquet | A columnar file format commonly used for analytics data. Your segments are Parquet files. |
| pyarrow | A Python library for working with columnar data. Installed as a dependency; query results are returned as pyarrow.Table. |
Install
See Installation for verification and notes about building from source.
Quickstart: create → append → query
In this example:
TimeSeriesTablemanages the on-disk table and appends.Sessionruns SQL over what you register and returns apyarrow.Table.
from __future__ import annotations
from pathlib import Path
import pyarrow as pa
import pyarrow.parquet as pq
import timeseries_table_format as ttf
def _write_tiny_prices_parquet(path: Path) -> None:
table = pa.table(
{
"ts": pa.array(
[0, 3_600 * 1_000_000, 7_200 * 1_000_000],
type=pa.timestamp("us"),
),
"symbol": pa.array(["NVDA", "NVDA", "NVDA"], type=pa.string()),
"close": pa.array([10.0, 20.0, 30.0], type=pa.float64()),
}
)
pq.write_table(table, str(path))
def run(*, table_root: Path) -> pa.Table:
table_root.mkdir(parents=True, exist_ok=True)
tbl = ttf.TimeSeriesTable.create(
table_root=str(table_root),
time_column="ts",
bucket="1h",
entity_columns=["symbol"],
timezone=None,
)
seg_path = table_root / "incoming" / "prices.parquet"
seg_path.parent.mkdir(parents=True, exist_ok=True)
_write_tiny_prices_parquet(seg_path)
tbl.append_parquet(str(seg_path))
sess = ttf.Session()
sess.register_tstable("prices", str(table_root))
return sess.sql(
"""
select ts, symbol, close
from prices
order by ts
"""
)
def main() -> None:
out = run(table_root=Path("./my_table"))
print(out)
if __name__ == "__main__":
main()
Next: - Tutorial: Create, append, query - Tutorial: Register + join - Tutorial: Parameterized queries - Tutorial: Real-world workflow - Concept: Buckets + overlap - Reference: Session