Delta Lake Tables
Delta Lake is an innovative open-source project designed to facilitate the construction of a Lakehouse architecture on existing data lakes. It offers essential features such as ACID transactions, scalable metadata handling, and seamless integration of streaming and batch data processing across platforms like S3, OSS, and HDFS.
Delta Lake tables for Relyt enable you to query data stored in Delta Lake using Relyt's query semantics, delivering the high performance that Relyt offers.
Currently, only Extreme DPS supports Delta Lake table processing. Make sure to use Extreme DPS clusters for running queries on Delta Lake tables.
Delta Lake tables leverage the capabilities of Delta Lake, offering:
-
ACID transactions
-
Scalable metadata handling
-
Support for both streaming and batch data
With Delta Lake tables, you can implement data versioning, perform time travel queries, and effectively manage large-scale datasets, ensuring data consistency and reliability.
Compatibility
Compatibility with Apache Spark
Delta Lake Version | Apache Spark Version |
---|---|
3.2.x | 3.5.x |
3.1.x | 3.5.x |
3.0.x | 3.5.x |
2.4.x | 3.4.x |
For more information, visit https://docs.delta.io/latest/releases.html.
Compatibility with AWS EMR
AWS EMR Version | Delta Lake Version | Apache Spark Version | Presto Version | Trino Version |
---|---|---|---|---|
emr-7.2.0 | 3.1.0 | 3.5.1 | 0.285 | 436 |
emr-7.1.0 | 3.0.0 | 3.5.0 | 0.284 | 435 |
emr-7.0.0 | 3.0.0 | 3.5.0 | 0.283 | 426 |
For more information, visit https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-components.html.
How to enable Delta Lake tables
Before using Delta Lake tables with Relyt, you must first integrate your data lake with Relyt and create an external schema.
-
Integrate your external data lakes with Relyt.
Before querying a Delta Lake table, ensure that your Relyt DW service units can integrate with the storage system and metastore of your Delta Lake.
Currently, Relyt supports integration with data lakes on AWS Glue, Lake Formation, and S3. For more information, see Configure Access to S3 Data Sources Through Integration with AWS Lake Formation.
-
Create an external schema.
For information about syntax and examples, see CREATE EXTERNAL SCHEMA.
-
Now, you can query metadata tables of your Delta Lake tables or run time travel queries on them.
Supported features and limitations
The following table outlines Relyt's support matrix for Delta Lake read features.
Currently, Relyt does not support writes to Delta Lake tables. It is enhancing its capabilities to support writes, with completion expected in Q1, 2024.
Feature | Min. Reader Version | Min. Delta Lake Version | Relyt's Support |
---|---|---|---|
Basic functionality | 1 | - | ✅ |
CHECK constraints | 1 | Delta Lake 0.8.0 | ✅ |
Generated columns | 1 | Delta Lake 1.0.0 | ✅ |
Table features read | 1 | Delta Lake 2.3.0 | ✅ |
Change data feed | 1 | Delta Lake 2.0.0 | ❌ |
Columns mapping | 2 | Delta Lake 1.2.0 | ❌ |
Deletion vectors | 2 | Delta Lake 2.3.0 | ❌ |
Timestamp w/o Timezone | 3 | Delta Lake 2.4.0 | ❌ |
V2 checkpoints | 3 | Delta Lake 3.0.0 | ❌ |
Iceberg compatibility V1 | 2 | Delta Lake 3.0.0 | ❌ |
Support for conlumn constraints
This section explains Relyt's support for column constraints. While these features are writer features, they remain read-transparent.
-
Minimum required Delta Lake version: 0.8.0
-
Feature name:
checkConstraints
-
Read support by Relyt: ✅
-
Write support by Spark: ✅, by using
ALTER TABLE delta.`{basePath}` ADD CONSTRAINT const1 CHECK (c_check > -2)
ALTER TABLE delta.`{basePath}` DROP CONSTRAINT const1
-
Minimum required Delta Lake version: 1.0.0
-
Feature name:
generatedColumns
-
Read support by Relyt: ✅
-
Write support by Spark: ✅, by using
.addColumn("c_generated_name", "STRING", generatedAlwaysAs="concat(c_first_name, ' ', c_last_name)")
-
Minimum required Delta Lake version: 3.1.0
-
Feature name:
allowColumnDefaults
-
Read support by Relyt: ✅
-
Write support by Spark: ✅, by using
TBLPROPERTIES ('delta.feature.allowColumnDefaults' = 'enabled')
ALTER TABLE test_col_constraint.col_constraint_base SET TBLPROPERTIES (
'delta.minReaderVersion' = '1',
'delta.minWriterVersion' = '7',
'delta.feature.allowColumnDefaults' = 'enabled'
);
ALTER TABLE {schemaName}.{tableName} ALTER COLUMN c_date SET DEFAULT CURRENT_TIMESTAMP();
ALTER TABLE {schemaName}.{tableName} ALTER COLUMN c_expr SET DEFAULT (3+5);
ALTER TABLE {schemaName}.{tableName} ALTER COLUMN c_default SET DEFAULT 417;
('test_default_null_2', DEFAULT, DEFAULT, DEFAULT)
-
Minimum required Delta Lake version: 3.1.0
-
Feature name:
identityColumns
-
Read support by Relyt: ❌
-
Write support by Spark: Not supported in open-source Spark
Support for schema evolution
Relyt supports reading tables with schema changes made via overwriteSchema
. It also supports schema evolution using mergeSchema
and Delta Lake’s column mapping features, including schema merging and column mapping.
For more information, see Schema Evolution.