
u/PrideDense2206

Delta Lake Community Sync: IcebergCompatV3 in delta-kernel-rs
Wondering about Apache Iceberg v3 compatibility inside delta-kernel-rs? Check out this video to hear from Songhang.
Want to get involved in the project? Reach out on the Delta Users slack (https://go.delta.io/slack) through the #delta-kernel channel.
Recording is live for the Delta Lake Community Meetup - 5/19
Thanks to everyone who came out for today's Delta Lake community meetup.
We covered the following:
- changes to the way we are handling github Issues and PR backlogs, and the introduction of the "stale", "not-stale" labels
- introduced catalog commits and catalog managed tables, and discussed the difference between the traditional Delta tables (transaction log in the file system) and how catalog managed tables change the way commits are resolved and managed via Unity Catalog or any other "catalog-commit" enables catalog. Ultimately, the delta table still syncs via the catalog to the file system, but the source of truth for commits and the state of the table resides in the catalog.
- We discussed when, where, how, why to use traditional delta lake tables vs catalog managed tables.
- We talked about performance optimizations and updates to delta-rs. See https://github.com/delta-io/delta-rs/blob/main/CHANGELOG.md for rust-v0.32.3 and python-v1.6.0 to follow along with the changes Tyler was discussing.
- We finished the conversation with a discussion on the unified Delta kernel - what it takes to provide a single kernel for both Java and Rust ecosystems. Nick dove into the new changes in the java FFM to discuss how foreign functions and memory changes native function calling in a world that was only JNI for years. This makes it easier to support calling java functions from rust, and enable the roundtrip between the delta-kernel-rs and the calling engine.
Thanks again for coming out. As always, pop over to https://delta.io/community/ to get involved.
---
If you want to test out catalog managed tables with Apache Spark, please take a look at our unitycatalog-playground project.
Also, if you want to learn more about delta-rs, take a look at this video: https://www.youtube.com/watch?v=i_jwF2sLRFs from Robert Pack and Zach Schuerman.
Halcyon Drop Rate?
I’m wondering if people are finding themselves playing through multiple biomes with maybe one drop or less. Is there a trick to getting it to drop more or is this more of a grind it out thing
Delta Lake Community Meetup
Hey everyone, we’ve got another exciting community meetup coming up next week. We’ve got all the details and an area for discussions or questions/answers on GitHub (link below).
See you next week!
Delta Grows Up: Writes, Unity Catalog and Time Travel
TL;DR: DuckDB's Delta and Unity Catalog extensions shed their experimental tags — now with writes, Unity Catalog and time travel support.
This trophy a nod to Expedition 33
Cause I love that game too.
What are Delta Lake Catalog-Managed Tables?
The next evolution of Delta: Catalog-Managed Tables
The data ecosystem is moving toward a catalog-centric model for managing open table formats. As open catalogs gain adoption, the catalog has emerged as the system of record for table identity, discovery, and authorization.
With Delta Lake 4.1.0, Delta introduces catalog-managed tables, which establish the catalog as the coordinator of table access and source of truth for table state. This simplifies how tables are discovered and secured, enables consistent governance across engines, and unlocks faster performance. The design also aligns Delta with the catalog-managed model pioneered by Iceberg, creating a shared foundation for interoperable, high-performance lakehouse tables.
Unity Catalog is the first open lakehouse catalog to support catalog-managed tables, extending unified governance across any format.
What are Catalog-Managed Tables?
Catalog-managed tables are tables for which the catalog brokers table access as well as stores the table’s latest metadata and commits. Clients reference the table by name, not by path, and use the catalog to resolve the table’s storage location. The catalog also manages concurrency control for proposed writes to a table. Writers leverage the catalog, not object store APIs, for atomic commits.
For more details, see the Delta protocol RFC on Github here. See how Unity Catalog implements support for the Catalog-Managed Tables specification here.
Before: Challenges with Delta tables that were managed by the filesystem
Before catalog-managed tables, the filesystem – not the catalog – was the primary authority for table access and changes to table state.
To access filesystem-managed Delta tables, Delta clients first look at the transaction log (_delta_log) stored with the table to determine the latest version. Clients then reconstruct the current state of the table by replaying the log entries, which describe the table’s schema and data files that belong to the table. Once the table state is known, the system reads the relevant data files to answer the query. When writing to the table, clients write new data files to storage and then atomically commit a new transaction log entry via filesystem APIs to advance the table to a new version.
Historically, data teams have faced the following challenges with filesystem-managed Delta tables:
- Brittle path-based access: Delta clients have to know the exact path of the filesystem-managed table they are accessing, and credentials have to be provisioned directly by the storage system. This tightly couples applications to physical storage locations, so routine changes like table relocation, storage reorganization, or credential rotation could break pipelines and queries.
- Risky coarse-grained authorization: Filesystems lack fine-grained access control, so complying with data privacy requirements often requires splitting datasets across multiple tables or storage paths to isolate sensitive fields or records. This leads to duplicated data, fragmented governance, and fragile pipelines.
- Unsafe schema changes: Path-based writes can modify table schemas or metadata without validation, potentially introducing incompatible changes that break downstream workloads. This occurs because storage credentials cannot distinguish between clients authorized to write data and those authorized to modify table metadata.
- Bottlenecked performance: Replaying the Delta transaction log to resolve a table’s latest state requires multiple calls to the filesystem, which can add 100+ ms to query execution.
Now: Catalog-Managed Delta Tables address these challenges
Catalog-managed tables address the governance and performance challenges by involving the catalog in read, write, and authorization coordination. This way, teams can unlock:
- Standardized table discovery: The catalog provides stable logical table identifiers (such as Unity Catalog’s three-level namespace), eliminating the need for clients to depend on physical storage paths for discovery.
- Unified governance: The catalog is responsible for granting clients access to data, rather than teams needing to manage fragmented access policies across their storage systems. This dramatically simplifies how data teams ensure engines access their data in a governed manner.
- Enforceable constraints: The catalog can authoritatively validate or reject schema and constraint changes, preventing incompatible updates that could compromise data integrity or break downstream workloads.
- Faster query planning and faster writes: If a Delta client is trying to access a table, the catalog can directly inform it of the table-level metadata. This skips cloud storage entirely and removes a major source of metadata latency. This feature also opens the door for “inline commits” where the (metadata) content of the commit is sent directly to the catalog.
Catalog-managed Delta tables dramatically simplify how engines discover and access data under consistent governance, all while improving read and write performance. Table state updates are flushed to the filesystem, reinforcing Delta’s openness and portability.
How do Catalog-Managed Tables work?
The Catalog-Managed Tables Delta feature fundamentally changes how Delta tables are discovered, read, and committed to.
Table Discovery
For catalog-managed tables, Delta tables are discovered and accessed through the catalog, not by filesystem paths. Engines must first resolve a table by name via the catalog, establishing table identity, location, and access credentials. This resolution step occurs before the Delta client interacts with the filesystem and determines the rules the client must follow for subsequent reads and writes.
Reads
A catalog-managed table may have commits that have been ratified by the catalog but not yet flushed, or “published”, to the filesystem. Reads therefore begin by getting these latest commits from the catalog, typically via a get_catalog_commits API exposed by the catalog.
If additional history is required, such as older published commits or checkpoints, Delta clients can LIST the filesystem and merge those published commits with the catalog-provided commits to construct a complete snapshot. This split view allows catalogs to always provide the most recent table state while offloading long-term commit storage to the filesystem.
Writes
Previously, writing to a Delta table involved calling filesystem “PUT-if-absent” APIs to perform atomic writes with mutual exclusion. In this model, the filesystem determined which writes win. While simple and scalable, this approach treated commits as opaque blobs: the filesystem could not inspect commit contents, enforce constraints, or coordinate writes across tables.
For catalog-managed tables, clients propose commits to the catalog, typically by first staging commits in the filesystem’s <table_path>/_delta_log/_staged_commits directory and then requesting ratification. Staging ensures that readers never observe unapproved commits. The protocol also allows for “inline” commits, where the contents of the commit are sent directly to the catalog, skipping the 100ms+ filesystem write. Staged commits are still performed using optimistic concurrency control to provide transactional guarantees.
Catalogs can also define their own commit APIs, allowing them to accept richer commit payloads, inspect actions and metadata, enforce constraints, and apply catalog-level policies before ratifying a commit.
To unburden catalogs from having to store these ratified commits indefinitely, ratified commits can be periodically “published” to the _delta_log in the filesystem. Once published, catalogs no longer need to retain or serve those commits because clients can easily discover them by listing.
Evolving open table formats
Catalog-managed Delta tables represent a critical convergence between how data is stored and how it is governed. Open table formats and open catalogs are evolving together so that governance becomes a native property of the table itself rather than an external overlay.
As an added benefit, Delta’s new catalog-oriented design closely resembles that of Iceberg tables. Ultimately, this makes it simpler for practitioners to discover and govern data consistently, regardless of table format.
We are excited to continue collaborating with the ecosystem to evolve Delta with open catalogs so that they deliver performant commits, efficient metadata management, multi-engine interoperability, and unified governance.
>You can read the original blog post over on delta.io
To learn more about Catalog Managed Tables / Catalog Commits, check out our video on youtube.
If you are a completionist, then your Silksong journey is going to take you to some interesting places. The First Sinner is hands down one of my favorite boss fights. It leaves no room for error and she even heals if you let her.
Anyone else have a great time in this fight?
The Delta Lake 4 journey has marked a shift from the file system to the catalog. Each release has deepened support for catalog-managed tables and extended that design philosophy across the Delta ecosystem. Delta Lake 4.2 advances on two fronts: Kernel expands outward with a new Apache Flink connector, streaming improvements, and broader data type support. Catalog-managed tables also mature with atomic operations, schema evolution from SQL, and synchronous UniForm.
New kernel-based Apache Flink Connector
-- Create the clickstream landing table as a Unity Catalog managed table
CREATE TEMPORARY TABLE clickstream_raw (
event_date STRING,
event_type STRING,
user_id STRING
) WITH (
'connector' = 'delta',
'table_name' = 'clickstream_raw',
'unitycatalog.name' = 'prod',
'unitycatalog.endpoint' = '<endpoint>',
'unitycatalog.token' = '<token>',
'partitions' = 'event_date',
'uid' = 'clickstream-ingest'
);
-- Stream events into the table via Flink SQL
INSERT INTO clickstream_raw VALUES
('2026-04-20', 'click', 'user_1'),
('2026-04-20', 'purchase', 'user_2'),
('2026-04-22', 'click', 'user_4');
Simplified Schema Evolution
We've introduced INSERT INTO … BY NAME now supports automatic schema evolution when autoMerge is enabled, adding new columns to the table schema as part of the commit. For SQL-first teams, this removes one of the last reasons to drop into a DataFrame notebook just to evolve a schema
SET spark.databricks.delta.schema.autoMerge.enabled = true;
INSERT INTO prod.consumer.clickstream BY NAME
SELECT event_date, event_type, user_id, device_type
FROM prod.consumer.clickstream_raw
WHERE event_date = '2026-04-23';
Data Type Support
In Delta Kernel, we add support for geospatial, collation, and variant types. Here’s how a clickstream pipeline can push event-specific properties into a single Variant payload:
CREATE TABLE prod.consumer.clickstream_v2 (
event_date DATE,
event_type STRING,
user_id STRING,
device_type STRING,
properties VARIANT
)
USING DELTA
PARTITIONED BY (event_date);
INSERT INTO prod.consumer.clickstream_v2 BY NAME
SELECT event_date, event_type, user_id, device_type,
parse_json(raw_properties) AS properties
FROM prod.consumer.clickstream_raw WHERE event_date = '2026-04-24';
This is just a few highlights from the release, for the full blog post take a look at https://delta.io/blog/2026-04-17-delta-4-2-released/.
Join us for this first 𝗗𝗲𝗹𝘁𝗮 𝗟𝗮𝗸𝗲 𝗖𝗼𝗺𝗺𝘂𝗻𝗶𝘁𝘆 𝗠𝗲𝗲𝘁𝘂𝗽 of 2026 on Tuesday, April 21 at 9AM PT! 🚀
We’re bringing the community together for a deep dive into the ecosystem, infrastructure enhancements, and the future project roadmap. Come get your technical questions answered live by the maintainers.
𝗪𝗵𝗮𝘁 𝘄𝗲'𝗹𝗹 𝗰𝗼𝘃𝗲𝗿:
🔹 Latest Delta Lake updates and how the community is evolving
🔹 A technical look at infrastructure enhancements
🔹 The future of Delta Lake: Roadmap insights and a deep dive into Iceberg v4 compatible metadata
🔹 Live Q&A with the community
RSVP 👇