Attach Apache Hive Metastore as a native DuckDB catalog and read/write the data inside with ease!
Installing and Loading
INSTALL hive_metastore FROM community;
LOAD hive_metastore;
Example
-- Attach the Hive Metastore as a catalog in DuckDB
ATTACH 'thrift://<host>:<port>' AS <catalog_name> (TYPE hive_metastore);
-- Query any HMS table as if it were native DuckDB
SELECT * FROM <catalog_name>.<schema_name>.<table_name>;
-- For object-storage-backed tables, configure storage secrets as usual
CREATE SECRET s3 (TYPE S3, KEY_ID 'access-key', SECRET 'secret-key', ENDPOINT 'localhost:9000');
-- Create a managed table (requires WAREHOUSE_LOCATION on ATTACH)
ATTACH 'thrift://localhost:9083' AS hms (TYPE hive_metastore, WAREHOUSE_LOCATION 's3a://my-bucket/warehouse');
CREATE TABLE hms.default.events (id INT, ts TIMESTAMP, payload VARCHAR);
INSERT INTO hms.default.events VALUES (1, NOW(), 'hello');
About hive_metastore
The DuckDB Hive Metastore extension connects DuckDB to an Apache Hive Metastore over the Thrift protocol and exposes HMS databases and tables as a first-class DuckDB catalog. Reads, writes, and table creation are all implemented natively, with DuckDB's format scanners handling the underlying data files.
Supported table formats
| Format | SELECT | INSERT | CTAS | CREATE TABLE |
|---|---|---|---|---|
| Parquet | yes | yes (non-partitioned) | yes | yes |
| CSV / Text | yes | yes (non-partitioned) | yes | yes |
| Delta Lake | yes | via delta extension |
no | no |
| Iceberg | yes¹ | no | no | no |
| Avro | yes | no (read-only) | no | metastore entry only |
| ORC | partial² | no | no | partial² |
¹ Iceberg tables created by Spark may require unsafe_enable_version_guessing = true.
² ORC support depends on DuckDB's underlying ORC scanner.
UPDATE, DELETE, and COPY TO against HMS tables are not supported - this matches Hive's append-only semantics.
Key capabilities
- Direct Thrift communication with the metastore - no REST catalog or Hadoop classpath needed
- Automatic format detection from HMS
inputFormat/outputFormat/ SerDe metadata - Schema and database discovery, including complex types (struct, array, map) from Spark-created tables
- Hive-style partitioning automatically applied at scan time
- S3-compatible object storage (AWS S3, MinIO, etc.) via DuckDB's
httpfs - Path normalization for Alibaba OSS (
oss://), Tencent COS (cos://,cosn://) - rewritten tos3://so any S3-compatible secret works - Avro logical-type handling:
date,timestamp-micros, anddecimalround-trip with correct SQL types - INSERT and CTAS write a single
data_<uuid>.<ext>file per statement to the table'sLOCATION; concurrent writers do not collide - CREATE TABLE creates managed (
MANAGED_TABLE) or external (EXTERNAL_TABLE) entries with proper Hive type strings, interoperable with Spark and Trino readers
Attaching a metastore
ATTACH 'thrift://<host>:<port>' AS <catalog_name> (<options>);
Options:
TYPE(required) - must behive_metastoreWAREHOUSE_LOCATION- default base path for managed tables created viaCREATE TABLEwithout an explicitLOCATION. When set, managed tables land at<warehouse>/<db>/<table>.DEFAULT_SCHEMA- schema used when a query omits the schema name. Defaults todefault.
Creating tables
ATTACH 'thrift://localhost:9083' AS hms
(TYPE hive_metastore, WAREHOUSE_LOCATION 's3a://my-bucket/warehouse');
-- Managed Parquet table (default format)
CREATE TABLE hms.default.sales (id INT, amount DECIMAL(10,2), region VARCHAR);
-- External table with explicit format and location
CREATE TABLE hms.default.events (id INT, ts TIMESTAMP)
WITH (location='s3a://my-bucket/events', format='parquet');
-- Avro: metastore entry only, written by Spark/Hive
CREATE TABLE hms.default.shipments (id INT, ship_date DATE)
WITH (location='s3a://my-bucket/shipments', format='avro');
Format and location are passed via standard SQL WITH (...) tags.
If neither LOCATION nor WAREHOUSE_LOCATION is available, CREATE TABLE fails with a clear error.
Reading Delta, Iceberg, and Avro
The extension delegates data reads to DuckDB's format scanners. Install and load the relevant extension before querying:
INSTALL delta; LOAD delta;
INSTALL iceberg; LOAD iceberg;
INSTALL avro; LOAD avro;
Object-storage credentials use the standard DuckDB CREATE SECRET flow:
CREATE SECRET s3 (
TYPE S3,
KEY_ID 'access-key',
SECRET 'secret-key',
ENDPOINT 's3.amazonaws.com'
);
Added Functions
This extension does not add any functions.
Overloaded Functions
This extension does not add any function overloads.
Added Types
This extension does not add any types.
Added Settings
This extension does not add any settings.