ClickHouse Proto Gen
ClickHouse Proto Gen is a standalone CLI tool that connects to a ClickHouse cluster, introspects table schemas, and generates corresponding Protocol Buffer (.proto) schema files. It bridges the gap between ClickHouse's powerful analytics capabilities and Protocol Buffers' type-safe, cross-platform data serialization.
Overview
ClickHouse Proto Gen automates the tedious process of manually creating Protocol Buffer definitions from your ClickHouse tables:
- Auto-discovery: Automatically extracts table schemas from ClickHouse
- Intelligent Type Mapping: Maps ClickHouse types to appropriate Proto3 types
- Preserves Documentation: Optionally includes table and column comments
- Selective Generation: Generate proto for specific tables or entire databases
- Configurable: Supports both CLI flags and YAML configuration files
Key Features
Automatic Schema Discovery
Point the tool at your ClickHouse database and it discovers:
- Table structures with all columns
- Data types with proper proto mappings
- Column comments and documentation
- Nullable fields and arrays
Smart Type Mapping
Handles complex ClickHouse types intelligently:
- Native types: Int, UInt, Float, String, DateTime, UUID
- Large integers: Int128, Int256 mapped to strings
- Arrays: ClickHouse
Array(T)becomes protorepeated T - Nullable: Special nullable filter handling
- Complex types: Map, Tuple, Enum with appropriate representations
- UInt64 conversion: Optional string conversion for JavaScript/TypeScript safety
Developer Workflow Integration
Designed to fit into modern development workflows:
- CI/CD friendly: Use in automated pipelines
- Version control: Generate proto files to commit alongside code
- Type safety: Enable type-safe data access across languages
- Cross-platform: Generate code for Go, Python, JavaScript, Rust, and more
Type Mappings
Numeric Types
| ClickHouse Type | Proto Type | Notes |
|---|---|---|
| Int8, Int16, Int32 | int32 | |
| Int64 | int64 | |
| Int128, Int256 | string | No native protobuf support |
| UInt8, UInt16, UInt32 | uint32 | |
| UInt64 | uint64 | Optionally convert to string for JS |
| Float32 | float | |
| Float64 | double | |
| Decimal* | string | Preserves precision |
Other Types
| ClickHouse Type | Proto Type | Notes |
|---|---|---|
| String, FixedString | string | |
| Date, DateTime | string | ISO 8601 format |
| Bool | bool | |
| UUID | string | |
| Array(T) | repeated T | |
| Nullable(T) | Uses nullable filters | Special handling |
| Map | string | JSON representation |
| Tuple | string | JSON representation |
| Enum8, Enum16 | string | Enum value as string |
| IPv4, IPv6 | string | IP address as string |
UInt64 to String Conversion
JavaScript's Number type cannot safely represent integers beyond Number.MAX_SAFE_INTEGER (2^53-1). For JavaScript/TypeScript clients, you can configure specific UInt64 fields to be converted to strings:
conversion:
uint64_to_string:
fct_my_table:
- my_column_a
- my_column_b
int_my_other_table:
- my_column_x
Or via CLI:
# Specific field in specific table
--uint64-to-string "fct_my_table.my_column_a"
# All fields in table
--uint64-to-string "int_my_other_table.*"
# Specific field in all tables
--uint64-to-string "*.my_column_x"
# All UInt64 fields everywhere
--uint64-to-string "*.*"
Use Cases
Type-Safe Data Access
Generate strongly-typed client libraries:
- Go structs with proper field types
- Python dataclasses with type hints
- TypeScript interfaces with strict typing
- Rust structs with compile-time guarantees
API Development
Build APIs with consistent schemas:
- Use with CBT API to generate REST endpoints
- Create gRPC services with type-safe contracts
- Generate OpenAPI specs from proto annotations
Data Pipeline Engineering
Ensure type consistency across pipelines:
- Validate data transformations with type checking
- Generate consistent schemas for different environments
- Version control your data contracts
Cross-Platform Development
Enable multiple language clients:
- Backend services in Go
- Analytics scripts in Python
- Frontend applications in TypeScript
- Data processing in Rust
Examples
Basic Usage
# Generate proto for specific tables
clickhouse-proto-gen \
--dsn "clickhouse://localhost:9000/mydb" \
--tables users,orders,products \
--out ./proto \
--package myapp.v1
Using Docker
# Generate with docker
docker run --rm -v "$(pwd):/workspace" \
ethpandaops/clickhouse-proto-gen \
--dsn "clickhouse://user:[email protected]:9000/mydb" \
--tables beacon_blocks,attestations \
--out /workspace/proto \
--package ethereum.v1 \
--include-comments
Configuration File
# config.yaml
dsn: clickhouse://user:password@localhost:9000/mydb
tables:
- users
- orders
- products
output_dir: ./proto
package: myapp.clickhouse.v1
go_package: github.com/myorg/myapp/gen/clickhousev1
include_comments: true
Run with:
clickhouse-proto-gen --config config.yaml
Cross-Database Generation
Generate proto files for tables across multiple databases:
clickhouse-proto-gen \
--dsn "clickhouse://localhost:9000" \
--tables "db1.users,db2.products,db3.orders" \
--out ./proto
Example Output
Given this ClickHouse table:
CREATE TABLE users (
id UInt64,
email String,
name Nullable(String),
created_at DateTime,
tags Array(String)
) ENGINE = MergeTree()
ORDER BY id
COMMENT = 'User accounts table';
ClickHouse Proto Gen generates:
syntax = "proto3";
package myapp.v1;
option go_package = "github.com/myorg/myapp/gen/v1";
// User accounts table
message Users {
uint64 id = 11;
string email = 12;
optional string name = 13;
string created_at = 14;
repeated string tags = 15;
}
Integration with ethPandaOps Stack
ClickHouse Proto Gen is used internally by:
- CBT API: Automatically generates proto files for discovered tables
- CBT: Can be used to generate type-safe models for transformations
- Xatu Data Pipeline: Generates proto definitions for Xatu data tables
Resources
Related Tools
- CBT API: Uses clickhouse-proto-gen internally to generate REST APIs
- CBT: Data transformation tool that can benefit from proto-based type safety
- Protocol Buffers: Official protobuf documentation
Community
Need help or want to contribute?
- Report issues on GitHub
- Join us on the Ethereum R&D Discord
- Check out related tools in the ethPandaOps ecosystem