Search Shortcut cmd + k | ctrl + k
hive_metastore

Attach Apache Hive Metastore as a native DuckDB catalog and read/write the data inside with ease!

Maintainer(s): ilum-cloud, thijs-s

Installing and Loading

INSTALL hive_metastore FROM community;
LOAD hive_metastore;

Example

-- Attach the Hive Metastore as a catalog in DuckDB
ATTACH 'thrift://<host>:<port>' AS <catalog_name> (TYPE hive_metastore);

-- Query any HMS table as if it were native DuckDB
SELECT * FROM <catalog_name>.<schema_name>.<table_name>;

-- For object-storage-backed tables, configure storage secrets as usual
CREATE SECRET s3 (TYPE S3, KEY_ID 'access-key', SECRET 'secret-key', ENDPOINT 'localhost:9000');

-- Create a managed table (requires WAREHOUSE_LOCATION on ATTACH)
ATTACH 'thrift://localhost:9083' AS hms (TYPE hive_metastore, WAREHOUSE_LOCATION 's3a://my-bucket/warehouse');
CREATE TABLE hms.default.events (id INT, ts TIMESTAMP, payload VARCHAR);
INSERT INTO hms.default.events VALUES (1, NOW(), 'hello');

About hive_metastore

The DuckDB Hive Metastore extension connects DuckDB to an Apache Hive Metastore over the Thrift protocol and exposes HMS databases and tables as a first-class DuckDB catalog. Reads, writes, and table creation are all implemented natively, with DuckDB's format scanners handling the underlying data files.

Supported table formats

Format SELECT INSERT CTAS CREATE TABLE
Parquet yes yes (non-partitioned) yes yes
CSV / Text yes yes (non-partitioned) yes yes
Delta Lake yes via delta extension no no
Iceberg yes¹ no no no
Avro yes no (read-only) no metastore entry only
ORC partial² no no partial²

¹ Iceberg tables created by Spark may require unsafe_enable_version_guessing = true. ² ORC support depends on DuckDB's underlying ORC scanner.

UPDATE, DELETE, and COPY TO against HMS tables are not supported - this matches Hive's append-only semantics.

Key capabilities

  • Direct Thrift communication with the metastore - no REST catalog or Hadoop classpath needed
  • Automatic format detection from HMS inputFormat / outputFormat / SerDe metadata
  • Schema and database discovery, including complex types (struct, array, map) from Spark-created tables
  • Hive-style partitioning automatically applied at scan time
  • S3-compatible object storage (AWS S3, MinIO, etc.) via DuckDB's httpfs
  • Path normalization for Alibaba OSS (oss://), Tencent COS (cos://, cosn://) - rewritten to s3:// so any S3-compatible secret works
  • Avro logical-type handling: date, timestamp-micros, and decimal round-trip with correct SQL types
  • INSERT and CTAS write a single data_<uuid>.<ext> file per statement to the table's LOCATION; concurrent writers do not collide
  • CREATE TABLE creates managed (MANAGED_TABLE) or external (EXTERNAL_TABLE) entries with proper Hive type strings, interoperable with Spark and Trino readers

Attaching a metastore

ATTACH 'thrift://<host>:<port>' AS <catalog_name> (<options>);

Options:

  • TYPE (required) - must be hive_metastore
  • WAREHOUSE_LOCATION - default base path for managed tables created via CREATE TABLE without an explicit LOCATION. When set, managed tables land at <warehouse>/<db>/<table>.
  • DEFAULT_SCHEMA - schema used when a query omits the schema name. Defaults to default.

Creating tables

ATTACH 'thrift://localhost:9083' AS hms
  (TYPE hive_metastore, WAREHOUSE_LOCATION 's3a://my-bucket/warehouse');

-- Managed Parquet table (default format)
CREATE TABLE hms.default.sales (id INT, amount DECIMAL(10,2), region VARCHAR);

-- External table with explicit format and location
CREATE TABLE hms.default.events (id INT, ts TIMESTAMP)
  WITH (location='s3a://my-bucket/events', format='parquet');

-- Avro: metastore entry only, written by Spark/Hive
CREATE TABLE hms.default.shipments (id INT, ship_date DATE)
  WITH (location='s3a://my-bucket/shipments', format='avro');

Format and location are passed via standard SQL WITH (...) tags. If neither LOCATION nor WAREHOUSE_LOCATION is available, CREATE TABLE fails with a clear error.

Reading Delta, Iceberg, and Avro

The extension delegates data reads to DuckDB's format scanners. Install and load the relevant extension before querying:

INSTALL delta;    LOAD delta;
INSTALL iceberg;  LOAD iceberg;
INSTALL avro;     LOAD avro;

Object-storage credentials use the standard DuckDB CREATE SECRET flow:

CREATE SECRET s3 (
  TYPE S3,
  KEY_ID 'access-key',
  SECRET 'secret-key',
  ENDPOINT 's3.amazonaws.com'
);

Added Functions

This extension does not add any functions.

Overloaded Functions

This extension does not add any function overloads.

Added Types

This extension does not add any types.

Added Settings

This extension does not add any settings.