Skip to main content

ClickHouse Proto Gen

ClickHouse Proto Gen is a standalone CLI tool that connects to a ClickHouse cluster, introspects table schemas, and generates corresponding Protocol Buffer (.proto) schema files. It bridges the gap between ClickHouse's powerful analytics capabilities and Protocol Buffers' type-safe, cross-platform data serialization.

Overview

ClickHouse Proto Gen automates the tedious process of manually creating Protocol Buffer definitions from your ClickHouse tables:

  • Auto-discovery: Automatically extracts table schemas from ClickHouse
  • Intelligent Type Mapping: Maps ClickHouse types to appropriate Proto3 types
  • Preserves Documentation: Optionally includes table and column comments
  • Selective Generation: Generate proto for specific tables or entire databases
  • Configurable: Supports both CLI flags and YAML configuration files

Key Features

Automatic Schema Discovery

Point the tool at your ClickHouse database and it discovers:

  • Table structures with all columns
  • Data types with proper proto mappings
  • Column comments and documentation
  • Nullable fields and arrays

Smart Type Mapping

Handles complex ClickHouse types intelligently:

  • Native types: Int, UInt, Float, String, DateTime, UUID
  • Large integers: Int128, Int256 mapped to strings
  • Arrays: ClickHouse Array(T) becomes proto repeated T
  • Nullable: Special nullable filter handling
  • Complex types: Map, Tuple, Enum with appropriate representations
  • UInt64 conversion: Optional string conversion for JavaScript/TypeScript safety

Developer Workflow Integration

Designed to fit into modern development workflows:

  • CI/CD friendly: Use in automated pipelines
  • Version control: Generate proto files to commit alongside code
  • Type safety: Enable type-safe data access across languages
  • Cross-platform: Generate code for Go, Python, JavaScript, Rust, and more

Type Mappings

Numeric Types

ClickHouse TypeProto TypeNotes
Int8, Int16, Int32int32
Int64int64
Int128, Int256stringNo native protobuf support
UInt8, UInt16, UInt32uint32
UInt64uint64Optionally convert to string for JS
Float32float
Float64double
Decimal*stringPreserves precision

Other Types

ClickHouse TypeProto TypeNotes
String, FixedStringstring
Date, DateTimestringISO 8601 format
Boolbool
UUIDstring
Array(T)repeated T
Nullable(T)Uses nullable filtersSpecial handling
MapstringJSON representation
TuplestringJSON representation
Enum8, Enum16stringEnum value as string
IPv4, IPv6stringIP address as string

UInt64 to String Conversion

JavaScript's Number type cannot safely represent integers beyond Number.MAX_SAFE_INTEGER (2^53-1). For JavaScript/TypeScript clients, you can configure specific UInt64 fields to be converted to strings:

conversion:
uint64_to_string:
fct_my_table:
- my_column_a
- my_column_b
int_my_other_table:
- my_column_x

Or via CLI:

# Specific field in specific table
--uint64-to-string "fct_my_table.my_column_a"

# All fields in table
--uint64-to-string "int_my_other_table.*"

# Specific field in all tables
--uint64-to-string "*.my_column_x"

# All UInt64 fields everywhere
--uint64-to-string "*.*"

Use Cases

Type-Safe Data Access

Generate strongly-typed client libraries:

  • Go structs with proper field types
  • Python dataclasses with type hints
  • TypeScript interfaces with strict typing
  • Rust structs with compile-time guarantees

API Development

Build APIs with consistent schemas:

  • Use with CBT API to generate REST endpoints
  • Create gRPC services with type-safe contracts
  • Generate OpenAPI specs from proto annotations

Data Pipeline Engineering

Ensure type consistency across pipelines:

  • Validate data transformations with type checking
  • Generate consistent schemas for different environments
  • Version control your data contracts

Cross-Platform Development

Enable multiple language clients:

  • Backend services in Go
  • Analytics scripts in Python
  • Frontend applications in TypeScript
  • Data processing in Rust

Examples

Basic Usage

# Generate proto for specific tables
clickhouse-proto-gen \
--dsn "clickhouse://localhost:9000/mydb" \
--tables users,orders,products \
--out ./proto \
--package myapp.v1

Using Docker

# Generate with docker
docker run --rm -v "$(pwd):/workspace" \
ethpandaops/clickhouse-proto-gen \
--dsn "clickhouse://user:[email protected]:9000/mydb" \
--tables beacon_blocks,attestations \
--out /workspace/proto \
--package ethereum.v1 \
--include-comments

Configuration File

# config.yaml
dsn: clickhouse://user:password@localhost:9000/mydb

tables:
- users
- orders
- products

output_dir: ./proto
package: myapp.clickhouse.v1
go_package: github.com/myorg/myapp/gen/clickhousev1
include_comments: true

Run with:

clickhouse-proto-gen --config config.yaml

Cross-Database Generation

Generate proto files for tables across multiple databases:

clickhouse-proto-gen \
--dsn "clickhouse://localhost:9000" \
--tables "db1.users,db2.products,db3.orders" \
--out ./proto

Example Output

Given this ClickHouse table:

CREATE TABLE users (
id UInt64,
email String,
name Nullable(String),
created_at DateTime,
tags Array(String)
) ENGINE = MergeTree()
ORDER BY id
COMMENT = 'User accounts table';

ClickHouse Proto Gen generates:

syntax = "proto3";

package myapp.v1;

option go_package = "github.com/myorg/myapp/gen/v1";

// User accounts table
message Users {
uint64 id = 11;
string email = 12;
optional string name = 13;
string created_at = 14;
repeated string tags = 15;
}

Integration with ethPandaOps Stack

ClickHouse Proto Gen is used internally by:

  • CBT API: Automatically generates proto files for discovered tables
  • CBT: Can be used to generate type-safe models for transformations
  • Xatu Data Pipeline: Generates proto definitions for Xatu data tables

Resources

Community

Need help or want to contribute?