ADR 0002 — Polars Lazy Scanning for Large Dataset Processing – Prometheus-X Components & Services

ADR 0002 — Polars Lazy Scanning for Large Dataset Processing

Status: Accepted

Context

The Maskott CSV contains 1,091,624 rows. Loading the full dataset into memory with pandas would require ~1–2 GB RAM and is not suitable for partner-controlled laptops.

Decision

Use Polars lazy scanning (pl.scan_csv) for all CSV ingestion, with chunked iteration via .slice(offset, chunk_size).collect(). Output is streamed to JSONL line-by-line. Pandas may be used only for small outputs and test fixtures.

Consequences