dlt + dbt on one Unity Catalog

dlt + dbt on One Unity Catalog (and Why It's Not 'Two DLTs')

Python-native ingestion with dlt (dlthub), SQL transformation with dbt, and a single Unity Catalog schema as the contract between them. Notes from wiring them together on a real Databricks workspace — including the naming collision that bit me in actual code.

June 22, 2026 · 9 min · 1740 words · Andrea Bozzo
Brain and Brain UI

Brain and Brain UI: A Knowledge Base You Actually Own

Brain is a Git-backed Markdown knowledge base. Brain UI is the Rust/Leptos control plane on top of it. This is the system, the connection between the two halves, a real config-driven taxonomy, and the honest reasons I’m putting the core in the open.

June 6, 2026 · 12 min · 2362 words · Andrea Bozzo
A site, hand-built - walkthrough cover

Building My Personal Mini-Site as a Real Project

How I rebuilt my personal site as a versioned, reproducible static-site system with clear separation between landing page, blog, generators, and a small Go API companion.

May 20, 2026 · 9 min · 1837 words · Andrea Bozzo
FinOps for AI Data Platforms 2026

FinOps for AI Data Platforms in 2026: Databricks vs AWS-Native vs DIY Iceberg on Top of Your Warehouse

Three lakehouse stacks for adding AI/ML and streaming on top of an existing warehouse, compared through a FinOps lens: DBUs, DPUs, TB-scanned, S3 GB-month, and egress, walked through real workload patterns with 2026 prices.

May 7, 2026 · 19 min · 3954 words · Andrea Bozzo
Zero Grappler logo

Zero Grappler: Data-Pipeline Thinking on a Microcontroller (Draft Notes from Before the Hardware Arrives)

Zero Grappler is a small no_std crate that applies a data-pipeline mindset to embedded ML: three traits, two async tasks, compile-time buffer sizing, zero allocations. This post is about the design choices — not yet a hardware report. The Pico 2 W smoke test on real silicon is still ahead of me.

April 21, 2026 · 12 min · 2370 words · Andrea Bozzo
Lance Format and LanceDB

Lance Format and LanceDB: Columnar Storage for the Embedding Age

Lance is a columnar storage format built for machine learning workloads — fast random access, native vector indexing, and zero-copy Arrow integration. This article walks through the format itself, how LanceDB builds on top of it, and how I wired it into a live NATS stream to build a simple semantic search layer over real-time events.

April 7, 2026 · 8 min · 1525 words · Andrea Bozzo

Guardrails for Tabular ML: A Data Engineer's Take on Data Leakage, Poisoning, and Brittle Pipelines

Most ML pipeline failures are not exotic model bugs — they are data issues that nobody encoded as checks. This article walks through building guardrails using pandas, Apache DataFusion, data contracts, and the Arrow C Data Interface.

March 23, 2026 · 13 min · 2649 words · Andrea Bozzo
1 Year of Claude Code

1 Year of Claude Code: An Interview

Claude interviews Andrea Bozzo about a full year of using Claude Code in the terminal — the workflow, the custom skills, the rough edges, and the nuked database.

March 5, 2026 · 9 min · 1796 words · Andrea Bozzo
Harvesting vs Scraping

Harvesting vs Scraping: Building Both Sides in Rust with Ares and Ceres

Two Rust projects, one conceptual divide. Ares fetches arbitrary web pages and uses LLMs to extract structured data; Ceres harvests metadata from CKAN portals and indexes it semantically. Together they show what it looks like to move from scraping scripts to production data pipelines.

February 20, 2026 · 14 min · 2907 words · Andrea Bozzo
Profiling data around Apache Arrow

Designing a Data Profiler Around Apache Arrow: Lessons from dataprof

A design story of dataprof: why I built a profiler around Apache Arrow, how it changed the architecture, and how this journey led to contributions to arrow-rs’ Parquet reader.

February 5, 2026 · 11 min · 2316 words · Andrea Bozzo