Andrea Bozzo | Blog

👋 Welcome to my technical blog!

I write about Data Engineering, Rust, Go, Python and open source technologies.

Exploring lakehouse architectures, real-time streaming and the modern data world.

dlt + dbt on one Unity Catalog

dlt + dbt on One Unity Catalog (and Why It's Not 'Two DLTs')

Python-native ingestion with dlt (dlthub), SQL transformation with dbt, and a single Unity Catalog schema as the contract between them. Notes from wiring them together on a real Databricks workspace — including the naming collision that bit me in actual code.

June 22, 2026 Â· 9 min Â· 1740 words Â· Andrea Bozzo
Brain and Brain UI

Brain and Brain UI: A Knowledge Base You Actually Own

Brain is a Git-backed Markdown knowledge base. Brain UI is the Rust/Leptos control plane on top of it. This is the system, the connection between the two halves, a real config-driven taxonomy, and the honest reasons I’m putting the core in the open.

June 6, 2026 Â· 12 min Â· 2362 words Â· Andrea Bozzo
A site, hand-built - walkthrough cover

Building My Personal Mini-Site as a Real Project

How I rebuilt my personal site as a versioned, reproducible static-site system with clear separation between landing page, blog, generators, and a small Go API companion.

May 20, 2026 Â· 9 min Â· 1837 words Â· Andrea Bozzo
FinOps for AI Data Platforms 2026

FinOps for AI Data Platforms in 2026: Databricks vs AWS-Native vs DIY Iceberg on Top of Your Warehouse

Three lakehouse stacks for adding AI/ML and streaming on top of an existing warehouse, compared through a FinOps lens: DBUs, DPUs, TB-scanned, S3 GB-month, and egress, walked through real workload patterns with 2026 prices.

May 7, 2026 Â· 19 min Â· 3954 words Â· Andrea Bozzo
Zero Grappler logo

Zero Grappler: Data-Pipeline Thinking on a Microcontroller (Draft Notes from Before the Hardware Arrives)

Zero Grappler is a small no_std crate that applies a data-pipeline mindset to embedded ML: three traits, two async tasks, compile-time buffer sizing, zero allocations. This post is about the design choices — not yet a hardware report. The Pico 2 W smoke test on real silicon is still ahead of me.

April 21, 2026 Â· 12 min Â· 2370 words Â· Andrea Bozzo
Lance Format and LanceDB

Lance Format and LanceDB: Columnar Storage for the Embedding Age

Lance is a columnar storage format built for machine learning workloads — fast random access, native vector indexing, and zero-copy Arrow integration. This article walks through the format itself, how LanceDB builds on top of it, and how I wired it into a live NATS stream to build a simple semantic search layer over real-time events.

April 7, 2026 Â· 8 min Â· 1525 words Â· Andrea Bozzo

Guardrails for Tabular ML: A Data Engineer's Take on Data Leakage, Poisoning, and Brittle Pipelines

Most ML pipeline failures are not exotic model bugs — they are data issues that nobody encoded as checks. This article walks through building guardrails using pandas, Apache DataFusion, data contracts, and the Arrow C Data Interface.

March 23, 2026 Â· 13 min Â· 2649 words Â· Andrea Bozzo
1 Year of Claude Code

1 Year of Claude Code: An Interview

Claude interviews Andrea Bozzo about a full year of using Claude Code in the terminal — the workflow, the custom skills, the rough edges, and the nuked database.

March 5, 2026 Â· 9 min Â· 1796 words Â· Andrea Bozzo
Harvesting vs Scraping

Harvesting vs Scraping: Building Both Sides in Rust with Ares and Ceres

Two Rust projects, one conceptual divide. Ares fetches arbitrary web pages and uses LLMs to extract structured data; Ceres harvests metadata from CKAN portals and indexes it semantically. Together they show what it looks like to move from scraping scripts to production data pipelines.

February 20, 2026 Â· 14 min Â· 2907 words Â· Andrea Bozzo
Profiling data around Apache Arrow

Designing a Data Profiler Around Apache Arrow: Lessons from dataprof

A design story of dataprof: why I built a profiler around Apache Arrow, how it changed the architecture, and how this journey led to contributions to arrow-rs’ Parquet reader.

February 5, 2026 Â· 11 min Â· 2316 words Â· Andrea Bozzo