Asset-centric data orchestration with partitions, declarative automation sensor, Snowflake-compatible CREATE DYNAMIC ASSET, MCP server, lineage and OpenLineage emission — all on DuckDB
Installing and Loading
INSTALL duckorch FROM community;
LOAD duckorch;
Example
LOAD duckorch;
-- 1. Set up state schema and register a directory of *.sql task files.
PRAGMA orch_init;
PRAGMA orch_register('./tasks/');
-- 2. Run the DAG (topological order, per-layer parallelism).
SET orch_max_parallel = 4;
PRAGMA orch_run;
-- 3. Asset-aware features (Phase 13+):
PRAGMA orch_asset_list; -- all registered Assets
PRAGMA orch_asset_partitions('analytics.daily'); -- per-partition status
PRAGMA orch_backfill('analytics.daily', '2026-01-01', '2026-12-31');
-- 4. Snowflake-compatible declaration (Phase 17):
PRAGMA orch_create_dynamic_asset(
'analytics.region_total',
'5 minutes',
'SELECT region, SUM(total) AS rt FROM analytics.daily GROUP BY region');
-- 5. Sensor — auto-materializes when upstream updates (Phase 15):
PRAGMA orch_sensor_set_interval(30);
PRAGMA orch_sensor_start;
-- 6. Mermaid lineage:
PRAGMA orch_visualize('lineage');
About duckorch
duckorch is a DuckDB-native, single-file Asset orchestrator that
brings Dagster's declarative automation and Snowflake's
CREATE DYNAMIC TABLE semantics into a LOAD duckorch;. Designed to
run on a laptop or in a plane without a cloud control plane.
Task definition (SQLMesh-style headers in plain .sql files):
-- @asset name=analytics.user_stats
-- @asset_group sales
-- @partitions_by daily(start=2026-01-01)
-- @param partition_key:DATE
-- @automation eager AND NOT in_progress()
-- @freshness max_lag=60min
-- @check name=positive "SELECT MIN(rev) FROM ${asset}" expect gt 0
CREATE OR REPLACE TABLE analytics.user_stats AS
SELECT country, SUM(rev) AS rev
FROM analytics.clean_users
WHERE event_date = $partition_key
GROUP BY country;
Inputs auto-extracted via sqlparser-rs. $partition_key bound via
DuckDB PREPARE (multi-statement aware).
Capabilities:
- Tasks + DAG (Phase 0-9): directory-loaded SQL files, topological
execution, per-layer parallelism, exponential backoff retry, downstream
skip,
@test, Mermaid (lineage/dag/combined), OpenLineage events, table- and column-level lineage with subtype taxonomy, opt-in ad-hoc query capture, DuckLake-aware OL namespace. - MCP server (Phase 11):
duck-orch-mcpstdio server (rmcp 0.3) with 9 tools for Claude Code integration (list_assets,run_pipelinewithdry_rundefault, etc.). - Asset (Phase 13):
@assetheaders promote outputs to first-class Assets in__orch__.assetswith materialization history, code_version hash, declared edges. Backward compatible with@outputs. - Partition (Phase 14):
@partitions_by daily/static/multi,$partition_keybind,orch_backfill, calendar-style ⚪ ASCII view. - AutomationCondition + Sensor (Phase 15): DSL
(
eager / on_cron / on_missing / freshness_violated / in_progress&/|/!),@target_lagthrottle, backgroundstd::threadsensor that polls and auto-firesRunSingleTaskwhen conditions met.
- Freshness + Asset Check (Phase 16):
@freshness max_lag=...,@check name=N "<SQL>" expect <op> <value>with auto-run on success andseverity=errorblocking downstream. - Snowflake
CREATE DYNAMIC ASSET(Phase 17):PRAGMA orch_create_dynamic_asset(name, target_lag, sql)synthesizes Asset +automation_condition='eager()'so the sensor picks it up. Theduck-orch dynamic migrate-from-snowflake <dump>CLI parses Snowflake dumps and registers each block (skippingWAREHOUSE/REFRESH_MODE).
3-route surface: every feature reachable from CLI (duck-orch ...),
SQL (PRAGMA orch_*), and MCP (Claude Code).
Companion CLI: duck-orch with register / run / status / graph /
test / validate / impact / lineage / schedule / asset / backfill /
automation / sensor / check / dynamic subcommands, all --json-capable.
State tables: __orch__.{tasks, runs, lineage_edges, column_lineage,
task_edges, tests, schedules, assets, asset_materializations,
asset_edges, asset_partitions, automation_evaluations, asset_checks,
asset_check_results}.
Architecture: thin C++ shim (~3000 lines, registers PRAGMAs, runs SQL via per-thread Connection, hosts the sensor thread, exposes OptimizerExtension for ad-hoc capture) plus a Rust workspace (orch_common / orch_dag / orch_lineage / orch_runtime / orch_ol / orch_core / orch_cli / orch_mcp), keeping all logic in Rust while the C++ layer handles DuckDB-internal calls.
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| orch_asset_health | pragma | NULL | NULL | |
| orch_asset_lineage | pragma | NULL | NULL | |
| orch_asset_list | pragma | NULL | NULL | |
| orch_asset_list_group | pragma | NULL | NULL | |
| orch_asset_materializations | pragma | NULL | NULL | |
| orch_asset_partitions | pragma | NULL | NULL | |
| orch_asset_partitions_calendar | pragma | NULL | NULL | |
| orch_asset_show | pragma | NULL | NULL | |
| orch_automation_simulate | pragma | NULL | NULL | |
| orch_automation_status | pragma | NULL | NULL | |
| orch_backfill | pragma | NULL | NULL | |
| orch_backfill_missing | pragma | NULL | NULL | |
| orch_build_dag | scalar | NULL | NULL | |
| orch_check_history | pragma | NULL | NULL | |
| orch_check_run | pragma | NULL | NULL | |
| orch_create_dynamic_asset | pragma | NULL | NULL | |
| orch_downstream_of | scalar | NULL | NULL | |
| orch_dynamic_list | pragma | NULL | NULL | |
| orch_dynamic_refresh | pragma | NULL | NULL | |
| orch_extract_io | scalar | NULL | NULL | |
| orch_hello | scalar | NULL | NULL | |
| orch_init | pragma | NULL | NULL | |
| orch_load_directory_json | scalar | NULL | NULL | |
| orch_parse_task | scalar | NULL | NULL | |
| orch_register | pragma | NULL | NULL | |
| orch_render_mermaid | scalar | NULL | NULL | |
| orch_run | pragma | NULL | NULL | |
| orch_run_partition | pragma | NULL | NULL | |
| orch_sensor_set_interval | pragma | NULL | NULL | |
| orch_sensor_start | pragma | NULL | NULL | |
| orch_sensor_status | pragma | NULL | NULL | |
| orch_sensor_stop | pragma | NULL | NULL | |
| orch_test | pragma | NULL | NULL | |
| orch_visualize | pragma | NULL | NULL |
Overloaded Functions
This extension does not add any function overloads.
Added Types
This extension does not add any types.
Added Settings
| name | description | input_type | scope | aliases |
|---|---|---|---|---|
| orch_capture_interactive | Capture column lineage for ad-hoc INSERT/CTAS queries via ParserExtension | BOOLEAN | GLOBAL | [] |
| orch_max_parallel | Maximum parallel tasks per DAG layer | BIGINT | GLOBAL | [] |
| orch_namespace | Job namespace for OpenLineage events | VARCHAR | GLOBAL | [] |
| orch_openlineage_api_key | OpenLineage API key | VARCHAR | GLOBAL | [] |
| orch_openlineage_debug | Log OpenLineage events to stderr | BOOLEAN | GLOBAL | [] |
| orch_openlineage_url | OpenLineage backend URL (e.g. http://localhost:5000/api/v1/lineage) | VARCHAR | GLOBAL | [] |