Skip to content

timeseries-table-format (Python)

timeseries-table-format helps you manage local, append-only time-series tables on disk. You append new data files over time and query across all of them with SQL, getting results back as a Python table object. It also prevents you from accidentally loading the same time window twice.

New here? Start with the Key concepts table below, then jump into the Quickstart.

A quick example of what this solves

Imagine you collect hourly price bars (open/high/low/close/volume) for a set of stock symbols — one Parquet file arrives each day. You want to query across 90 days of history with SQL, without building your own directory-scanning logic and without accidentally re-ingesting a day you already loaded. That's exactly the problem this library handles: it tracks what time windows are already covered (per symbol) and rejects duplicates automatically.

When to use this

  • You ingest time-series data incrementally (new files arrive over time).
  • You want SQL across segments/tables without building your own dataset plumbing.
  • You want guardrails against accidentally re-ingesting the same time window (overlap detection).
When not to use this (v0)
  • You need S3/object storage backends.
  • You need compaction, schema evolution, or upserts/merges.
  • You want a centralized database/server.

Key concepts (quick reference)

Term What it means
table root A local directory that holds a time-series table: metadata, segments, and coverage data.
segment A single Parquet file appended to a table. A table is made up of one or more segments.
bucket The time granularity used for overlap detection (e.g. "1h", "1d"). Does not resample data.
entity The logical identity of a time series (e.g. a stock symbol). Defined by entity_columns.
overlap detection The guard that prevents you from appending data for the same entity + bucket twice.
Session A DataFusion SQL session. Register tables into it and run SQL queries returning pyarrow.Table.
DataFusion The SQL engine used internally. You don't need to install it separately.
Parquet A columnar file format commonly used for analytics data. Your segments are Parquet files.
pyarrow A Python library for working with columnar data. Installed as a dependency; query results are returned as pyarrow.Table.

Install

pip install timeseries-table-format

See Installation for verification and notes about building from source.

Quickstart: create → append → query

In this example:

  • TimeSeriesTable manages the on-disk table and appends.
  • Session runs SQL over what you register and returns a pyarrow.Table.
from __future__ import annotations

from pathlib import Path

import pyarrow as pa
import pyarrow.parquet as pq

import timeseries_table_format as ttf


def _write_tiny_prices_parquet(path: Path) -> None:
    table = pa.table(
        {
            "ts": pa.array(
                [0, 3_600 * 1_000_000, 7_200 * 1_000_000],
                type=pa.timestamp("us"),
            ),
            "symbol": pa.array(["NVDA", "NVDA", "NVDA"], type=pa.string()),
            "close": pa.array([10.0, 20.0, 30.0], type=pa.float64()),
        }
    )
    pq.write_table(table, str(path))


def run(*, table_root: Path) -> pa.Table:
    table_root.mkdir(parents=True, exist_ok=True)

    tbl = ttf.TimeSeriesTable.create(
        table_root=str(table_root),
        time_column="ts",
        bucket="1h",
        entity_columns=["symbol"],
        timezone=None,
    )

    seg_path = table_root / "incoming" / "prices.parquet"
    seg_path.parent.mkdir(parents=True, exist_ok=True)
    _write_tiny_prices_parquet(seg_path)

    tbl.append_parquet(str(seg_path))

    sess = ttf.Session()
    sess.register_tstable("prices", str(table_root))

    return sess.sql(
        """
        select ts, symbol, close
        from prices
        order by ts
        """
    )


def main() -> None:
    out = run(table_root=Path("./my_table"))
    print(out)


if __name__ == "__main__":
    main()

Next: - Tutorial: Create, append, query - Tutorial: Register + join - Tutorial: Parameterized queries - Tutorial: Real-world workflow - Concept: Buckets + overlap - Reference: Session