Search Shortcut cmd + k | ctrl + k
onelake

This extension allows you to connect DuckDB to Microsoft Fabric OneLake workspaces and lakehouses, enabling you to query data stored in OneLake directly from DuckDB.

Maintainer(s): achrafcei

Installing and Loading

INSTALL onelake FROM community;
LOAD onelake;

Example

-- First, create a secret to authenticate with OneLake.
-- You can use either a service principal or the credential chain (e.g., Azure CLI).
-- Uncomment the desired method and provide the necessary credentials.
set azure_transport_option_type = 'curl';
CREATE SECRET  (
    TYPE azure,
    PROVIDER service_principal,
    TENANT_ID '<your_tenant_id>',
    CLIENT_ID '<your_client_id>',
    CLIENT_SECRET '<your_client_secret>'
);
-- CREATE SECRET  (
--     TYPE azure,
--     PROVIDER credential_chain,
--     CHAIN 'cli'
-- );

CREATE SECRET onelake (
    TYPE ONELAKE,
    TENANT_ID '<your_tenant_id>',
    CLIENT_ID '<your_client_id>',
    CLIENT_SECRET '<your_client_secret>'
);

-- CREATE SECRET  onelake(
--     TYPE ONELAKE,
--     PROVIDER credential_chain,
--     CHAIN 'cli'
-- );

-- Optional: use preissued tokens stored in env variables (defaults shown)
--SET onelake_env_fabric_token_variable = 'FABRIC_API_TOKEN';
--SET onelake_env_storage_token_variable = 'AZURE_STORAGE_TOKEN';
--CREATE SECRET onelake_env (
--    TYPE ONELAKE,
--    PROVIDER credential_chain,
--    CHAIN 'env'
--);

-- Attach to your OneLake workspace and lakehouse
ATTACH '<your_workspace_name>/<your_lakehouse_name>.Lakehouse'
    AS <your_connection_name>
    (TYPE ONELAKE);

USE <your_connection_name>.<your_schema_name>; -- e.g., dbo

SHOW TABLES;

-- Query tables (Delta or Iceberg)
SELECT * FROM <your_table_name> LIMIT 10;

-- Create a new Delta table
CREATE TABLE new_table (
    id INT,
    name VARCHAR
);

-- Create a new Delta table with partitioning
CREATE TABLE new_table_partitioned (
    id INT,
    name VARCHAR,
    region VARCHAR
) PARTITION BY (region);

-- Write to Delta tables (INSERT)
INSERT INTO new_table VALUES (1, 'New Data');

-- SET write mode to overwrite (default is append)
SET onelake_delta_write_mode = 'overwrite';
INSERT INTO new_table VALUES (2, 'Overwritten Data');

-- Perform destructive operations (UPDATE, DELETE, DROP)
-- These require explicit opt-in per session
SET onelake_allow_destructive_operations = true;

UPDATE new_table 
SET name = 'Updated Value' 
WHERE id = 1;

DELETE FROM new_table WHERE id = 1;

DROP TABLE new_table;

About onelake

This extension enables DuckDB to connect to Microsoft Fabric OneLake workspaces and lakehouses, allowing users to query and modify data stored in OneLake directly from DuckDB. It supports authentication via service principals, credential chains (e.g., Azure CLI), or environment tokens and provides seamless integration with OneLake's data storage capabilities.

Key Features:

  • Read support for both Delta and Iceberg tables.
  • Full write support for Delta tables including CREATE TABLE, INSERT, UPDATE, DELETE, and DROP.
  • Configurable write modes (append, overwrite) and schema evolution.
  • ALTER TABLE support is planned for future releases.

Added Functions

function_name function_type description comment examples
copy_dir table_macro NULL NULL  
delta_filter_pushdown_log table_macro NULL NULL  
delta_filter_pushdown_log_tpcds table_macro NULL NULL  
delta_list_files table NULL NULL  
delta_scan table NULL NULL  
get_delta_test_expression scalar NULL NULL  
parse_delta_filter_logline macro NULL NULL  
write_blob scalar NULL NULL  

Added Settings

name description input_type scope aliases      
auto_fallback_to_full_download Allows automatically falling back to full file downloads when possible. BOOLEAN GLOBAL []      
ca_cert_file Path to a custom certificate file for self-signed certificates. VARCHAR GLOBAL []      
delta_kernel_logging Forwards the internal logging of the Delta Kernel to the duckdb logger. Warning: this may impact performance even with DuckDB logging disabled. BOOLEAN GLOBAL []      
delta_scan_explain_files_filtered Adds the filtered files to the explain output. Warning: this may impact performance of delta scan during explain analyze queries. BOOLEAN GLOBAL []      
enable_curl_server_cert_verification Enable server side certificate verification for CURL backend. BOOLEAN GLOBAL []      
enable_server_cert_verification Enable server side certificate verification. BOOLEAN GLOBAL []      
force_download Forces upfront download of file BOOLEAN GLOBAL []      
hf_max_per_page Debug option to limit number of items returned in list requests UBIGINT GLOBAL []      
http_keep_alive Keep alive connections. Setting this to false can help when running into connection failures BOOLEAN GLOBAL []      
http_retries HTTP retries on I/O error UBIGINT GLOBAL []      
http_retry_backoff Backoff factor for exponentially increasing retry wait time FLOAT GLOBAL []      
http_retry_wait_ms Time between retries UBIGINT GLOBAL []      
http_timeout HTTP timeout read/write/connection/retry (in seconds) UBIGINT GLOBAL []      
httpfs_client_implementation Select which is the HTTPUtil implementation to be used VARCHAR GLOBAL []      
onelake_allow_destructive_operations Allow destructive operations like DROP TABLE (default: false) BOOLEAN GLOBAL []      
onelake_env_fabric_token_variable Environment variable name that stores the Fabric API access token VARCHAR GLOBAL []      
onelake_env_storage_token_variable Environment variable name that stores the OneLake storage access token VARCHAR GLOBAL []      
onelake_insert_mode Default INSERT mode (append overwrite error_if_exists ignore) VARCHAR GLOBAL []
onelake_partition_columns Comma-separated list of partition columns for the next CREATE TABLE VARCHAR GLOBAL []      
onelake_safe_cast Enable safe type casting during writes BOOLEAN GLOBAL []      
onelake_schema_mode Schema evolution mode (empty merge overwrite) VARCHAR GLOBAL []  
onelake_target_file_size Target Parquet file size in bytes (0 = default) UBIGINT GLOBAL []      
onelake_write_batch_size Number of rows per write batch (0 = default) UBIGINT GLOBAL []      
s3_access_key_id S3 Access Key ID VARCHAR GLOBAL []      
s3_endpoint S3 Endpoint VARCHAR GLOBAL []      
s3_kms_key_id S3 KMS Key ID VARCHAR GLOBAL []      
s3_region S3 Region VARCHAR GLOBAL []      
s3_requester_pays S3 use requester pays mode BOOLEAN GLOBAL []      
s3_secret_access_key S3 Access Key VARCHAR GLOBAL []      
s3_session_token S3 Session Token VARCHAR GLOBAL []      
s3_uploader_max_filesize S3 Uploader max filesize (between 50GB and 5TB) VARCHAR GLOBAL []      
s3_uploader_max_parts_per_file S3 Uploader max parts per file (between 1 and 10000) UBIGINT GLOBAL []      
s3_uploader_thread_limit S3 Uploader global thread limit UBIGINT GLOBAL []      
s3_url_compatibility_mode Disable Globs and Query Parameters on S3 URLs BOOLEAN GLOBAL []      
s3_url_style S3 URL style VARCHAR GLOBAL []      
s3_use_ssl S3 use SSL BOOLEAN GLOBAL []      
unsafe_disable_etag_checks Disable checks on ETag consistency BOOLEAN GLOBAL []