Case Study

Druid Segment Bridge

Query Apache Druid segments directly through DataFusion.

Work in progress Rust Apache Druid DataFusion Arrow Query Engine
druid-datafusion-bridge project logo

System anatomy

  1. Inputs

    • Druid segment files (index.dr)
    • Wikipedia segment fixtures
    • DataFusion SessionContext
    • Test query plans
  2. Core

    • Segment parser layer
    • TableProvider adapter
    • ExecutionPlan adapter
    • Arrow-friendly conversion
  3. Outputs

    • DataFusion-queryable tables
    • Integration test results
    • Offline segment access
    • Translated dictionaries + metrics
Constraints
  • No running Druid cluster needed
  • Vectorized where feasible
  • Work-in-progress evaluation
  • Format quirks preserved

Why it exists

Query engines become more reusable when storage formats can be inspected directly instead of only through the original serving system. Druid segments contain years of useful analytical engineering, but they are usually approached through a running Druid cluster; this project asks what becomes possible when those segment files can be read directly by a Rust query stack.

Technical center

The library reads Druid segment structures such as index.dr data and maps them into DataFusion table and execution abstractions, aiming for vectorized Arrow-friendly execution where possible. The hard part is translating storage internals, dictionaries, metrics, and bitmap-oriented access patterns into DataFusion abstractions without pretending the original format was designed as a simple Arrow file.

Current proof points

Even as a work in progress, the evaluation surface is already concrete: a dedicated segment parser layer, DataFusion `TableProvider` and `ExecutionPlan` adapters, Wikipedia segment fixtures, and integration tests around offline segment access rather than hand-wavy architectural claims. The current value is exploratory but specific: each parser and adapter clarifies how much of Druid's segment model can be exposed through modern Arrow and DataFusion interfaces.