Skip to content

Tutorial: create, append, query

Goal: Create a table on disk, append a Parquet segment, then query it with SQL.

Prereqs: Installed timeseries-table-format (see Installation).

What you’ll learn: - How a table root is created and stays self-contained on disk - How appends work (and what overlap detection is protecting you from) - How Session queries registered tables and returns a pyarrow.Table

Mental model

  • TimeSeriesTable manages the on-disk table and appends.
  • Session runs SQL over what you register (tables, Parquet datasets, etc.).

Steps

1) Create a table root (TimeSeriesTable.create) 2) Write a tiny Parquet segment (toy data) 3) Append it (append_parquet) 4) Create a SQL session (Session) 5) Register the table (register_tstable) 6) Query (Session.sql) → pyarrow.Table

The full example below is the exact code used in docs (kept in sync with the repo):

from __future__ import annotations

from pathlib import Path

import pyarrow as pa
import pyarrow.parquet as pq

import timeseries_table_format as ttf


def _write_tiny_prices_parquet(path: Path) -> None:
    table = pa.table(
        {
            "ts": pa.array(
                [0, 3_600 * 1_000_000, 7_200 * 1_000_000],
                type=pa.timestamp("us"),
            ),
            "symbol": pa.array(["NVDA", "NVDA", "NVDA"], type=pa.string()),
            "close": pa.array([10.0, 20.0, 30.0], type=pa.float64()),
        }
    )
    pq.write_table(table, str(path))


def run(*, table_root: Path) -> pa.Table:
    table_root.mkdir(parents=True, exist_ok=True)

    tbl = ttf.TimeSeriesTable.create(
        table_root=str(table_root),
        time_column="ts",
        bucket="1h",
        entity_columns=["symbol"],
        timezone=None,
    )

    seg_path = table_root / "incoming" / "prices.parquet"
    seg_path.parent.mkdir(parents=True, exist_ok=True)
    _write_tiny_prices_parquet(seg_path)

    tbl.append_parquet(str(seg_path))

    sess = ttf.Session()
    sess.register_tstable("prices", str(table_root))

    return sess.sql(
        """
        select ts, symbol, close
        from prices
        order by ts
        """
    )


def main() -> None:
    out = run(table_root=Path("./my_table"))
    print(out)


if __name__ == "__main__":
    main()

What happens in the example?

Create a table

TimeSeriesTable.create(...) initializes a table root directory and writes initial metadata.

entity_columns explained

entity_columns=["symbol"] tells the table that coverage is tracked per symbol independently. That means AAPL at 10:00 and NVDA at 10:00 are considered separate coverage — appending data for one symbol never blocks appends for a different symbol in the same time window.

Append a Parquet segment

append_parquet(...) adds the Parquet file as a new segment.

By default, if the Parquet file is outside the table root, it is copied under the table root before being committed (so the table is self-contained on disk).

What happens if you run this twice?

If you run the example a second time against the same table root, append_parquet(...) will raise CoverageOverlapError. That's intentional — the table already has coverage for those hour buckets, so it refuses to re-ingest the same window. This is the overlap detection working as designed.

To reset for experimentation, delete the table root directory and start fresh.

Query with SQL

Session is a DataFusion-backed SQL session. You register a table under a name and then query it.

Session.sql(...) returns a pyarrow.Table.

Streaming large results

For large result sets, Session.sql_reader(...) returns a streaming pyarrow.RecordBatchReader instead of materializing the full result into memory. See Reference: Session.

Notebook display

In IPython/Jupyter (including VS Code notebooks), pyarrow.Table results display as a bounded HTML preview by default (the return type is still a real pyarrow.Table).

  • Opt-out: set TTF_NOTEBOOK_DISPLAY=0 before importing timeseries_table_format, or call timeseries_table_format.disable_notebook_display()
  • Configure: call timeseries_table_format.enable_notebook_display(max_rows=..., max_cols=..., max_cell_chars=..., align=...)
  • Alignment: set TTF_NOTEBOOK_ALIGN=auto|left|right before importing timeseries_table_format (or pass align=... to enable_notebook_display(...))
  • Config file (TOML): set TTF_NOTEBOOK_CONFIG=path/to/ttf.toml before importing timeseries_table_format (on Python 3.10, install tomli to enable TOML parsing)

Note

The Python API is synchronous. Internally, long-running Rust operations run on an internal Tokio runtime and release the GIL.

Next: - Tutorial: Register + join - Concept: Buckets + overlap - Reference: Session