dataprof
Arrow-native profiling in Rust with CLI and Python surfaces, designed for bounded-memory data quality workflows.
Andrea Bozzo
Open-source tools, lakehouse experiments, and engineering notes from the places where pipelines, storage, and developer tooling meet.
Selected proof
Start here if you want the clearest signal: one built tool, one upstream contribution track, and one writing archive.
Arrow-native profiling in Rust with CLI and Python surfaces, designed for bounded-memory data quality workflows.
Public PRs across Arrow, DataFusion, Iceberg Rust, and Fluss Rust, tied back to real downstream constraints.
Long-form articles on data platforms, Rust/Python systems, lakehouse tradeoffs, and open-source project notes.
System View
The public surface is intentionally hand-built, but the repository behind it is a real delivery system: static homepage, Hugo archive, generated work pages, Rust/WASM workbench, Go harvester, and a Vercel companion API.
The landing page is plain HTML, CSS, and JavaScript so the public surface stays lightweight and explicit.
Hugo handles the writing archive, while the workbench logic is mirrored between JavaScript and Rust compiled to WebAssembly.
Go generators turn structured JSON and repository data into case-study pages, contribution cards, and static artifacts.
GitHub Pages serves the static site, and Vercel only carries the live GitHub stats and badge endpoints.
Workbench
One input for case studies, blog posts, open-source work, reviewed papers, and the technical threads that connect them.
Open Source
A few projects I have sent patches to. The list is pulled from the repository README.
Reviews & Papers
Public material for two IEEE paper submissions, with benchmarks, demos, and reproducible companion assets.
Contact
For freelance data infrastructure work, recruiting context, or technical follow-up, email is the cleanest first step.