Tutorial: create, append, query
Goal: Create a table on disk, append a Parquet segment, then query it with SQL.
Prereqs: Installed timeseries-table-format (see Installation).
What you’ll learn:
- How a table root is created and stays self-contained on disk
- How appends work (and what overlap detection is protecting you from)
- How Session queries registered tables and returns a pyarrow.Table
Mental model
TimeSeriesTablemanages the on-disk table and appends.Sessionruns SQL over what you register (tables, Parquet datasets, etc.).
Steps
1) Create a table root (TimeSeriesTable.create)
2) Write a tiny Parquet segment (toy data)
3) Append it (append_parquet)
4) Create a SQL session (Session)
5) Register the table (register_tstable)
6) Query (Session.sql) → pyarrow.Table
The full example below is the exact code used in docs (kept in sync with the repo):
from __future__ import annotations
from pathlib import Path
import pyarrow as pa
import pyarrow.parquet as pq
import timeseries_table_format as ttf
def _write_tiny_prices_parquet(path: Path) -> None:
table = pa.table(
{
"ts": pa.array(
[0, 3_600 * 1_000_000, 7_200 * 1_000_000],
type=pa.timestamp("us"),
),
"symbol": pa.array(["NVDA", "NVDA", "NVDA"], type=pa.string()),
"close": pa.array([10.0, 20.0, 30.0], type=pa.float64()),
}
)
pq.write_table(table, str(path))
def run(*, table_root: Path) -> pa.Table:
table_root.mkdir(parents=True, exist_ok=True)
tbl = ttf.TimeSeriesTable.create(
table_root=str(table_root),
time_column="ts",
bucket="1h",
entity_columns=["symbol"],
timezone=None,
)
seg_path = table_root / "incoming" / "prices.parquet"
seg_path.parent.mkdir(parents=True, exist_ok=True)
_write_tiny_prices_parquet(seg_path)
tbl.append_parquet(str(seg_path))
sess = ttf.Session()
sess.register_tstable("prices", str(table_root))
return sess.sql(
"""
select ts, symbol, close
from prices
order by ts
"""
)
def main() -> None:
out = run(table_root=Path("./my_table"))
print(out)
if __name__ == "__main__":
main()
What happens in the example?
Create a table
TimeSeriesTable.create(...) initializes a table root directory and writes initial metadata.
entity_columns explained
entity_columns=["symbol"] tells the table that coverage is tracked per symbol independently.
That means AAPL at 10:00 and NVDA at 10:00 are considered separate coverage — appending data
for one symbol never blocks appends for a different symbol in the same time window.
Append a Parquet segment
append_parquet(...) adds the Parquet file as a new segment.
By default, if the Parquet file is outside the table root, it is copied under the table root before being committed (so the table is self-contained on disk).
What happens if you run this twice?
If you run the example a second time against the same table root, append_parquet(...) will
raise CoverageOverlapError. That's intentional — the table already has coverage for those
hour buckets, so it refuses to re-ingest the same window. This is the overlap detection
working as designed.
To reset for experimentation, delete the table root directory and start fresh.
Query with SQL
Session is a DataFusion-backed SQL session. You register a table under a name and then query it.
Session.sql(...) returns a pyarrow.Table.
Streaming large results
For large result sets, Session.sql_reader(...) returns a streaming pyarrow.RecordBatchReader
instead of materializing the full result into memory. See Reference: Session.
Notebook display
In IPython/Jupyter (including VS Code notebooks), pyarrow.Table results display as a bounded HTML preview by default (the return type is still a real pyarrow.Table).
- Opt-out: set
TTF_NOTEBOOK_DISPLAY=0before importingtimeseries_table_format, or calltimeseries_table_format.disable_notebook_display() - Configure: call
timeseries_table_format.enable_notebook_display(max_rows=..., max_cols=..., max_cell_chars=..., align=...) - Alignment: set
TTF_NOTEBOOK_ALIGN=auto|left|rightbefore importingtimeseries_table_format(or passalign=...toenable_notebook_display(...)) - Config file (TOML): set
TTF_NOTEBOOK_CONFIG=path/to/ttf.tomlbefore importingtimeseries_table_format(on Python 3.10, installtomlito enable TOML parsing)
Note
The Python API is synchronous. Internally, long-running Rust operations run on an internal Tokio runtime and release the GIL.
Next: - Tutorial: Register + join - Concept: Buckets + overlap - Reference: Session