Skip to main content

Delta Lake Tables

Delta Lake is an innovative open-source project designed to facilitate the construction of a Lakehouse architecture on existing data lakes. It offers essential features such as ACID transactions, scalable metadata handling, and seamless integration of streaming and batch data processing across platforms like S3, OSS, and HDFS.

Delta Lake tables for Relyt enable you to query data stored in Delta Lake using Relyt's query semantics, delivering the high performance that Relyt offers.

important

Currently, only Extreme DPS supports Delta Lake table processing. Make sure to use Extreme DPS clusters for running queries on Delta Lake tables.


Delta Lake tables leverage the capabilities of Delta Lake, offering:

  • ACID transactions

  • Scalable metadata handling

  • Support for both streaming and batch data

With Delta Lake tables, you can implement data versioning, perform time travel queries, and effectively manage large-scale datasets, ensuring data consistency and reliability.


Compatibility

Compatibility with Apache Spark

Delta Lake VersionApache Spark Version
3.2.x3.5.x
3.1.x3.5.x
3.0.x3.5.x
2.4.x3.4.x

For more information, visit https://docs.delta.io/latest/releases.html.

Compatibility with AWS EMR

AWS EMR VersionDelta Lake VersionApache Spark VersionPresto VersionTrino Version
emr-7.2.03.1.03.5.10.285436
emr-7.1.03.0.03.5.00.284435
emr-7.0.03.0.03.5.00.283426

For more information, visit https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-components.html.


How to enable Delta Lake tables

Before using Delta Lake tables with Relyt, you must first integrate your data lake with Relyt and create an external schema.

  1. Integrate your external data lakes with Relyt.

    Before querying a Delta Lake table, ensure that your Relyt DW service units can integrate with the storage system and metastore of your Delta Lake.

    Currently, Relyt supports integration with data lakes on AWS Glue, Lake Formation, and S3. For more information, see Configure Access to S3 Data Sources Through Integration with AWS Lake Formation.

  2. Create an external schema.

    For information about syntax and examples, see CREATE EXTERNAL SCHEMA.

  3. Now, you can query metadata tables of your Delta Lake tables or run time travel queries on them.


Supported features and limitations

The following table outlines Relyt's support matrix for Delta Lake read features.

info

Currently, Relyt does not support writes to Delta Lake tables. It is enhancing its capabilities to support writes, with completion expected in Q1, 2024.

FeatureMin. Reader VersionMin. Delta Lake VersionRelyt's Support
Basic functionality1-
CHECK constraints1Delta Lake 0.8.0
Generated columns1Delta Lake 1.0.0
Table features read1Delta Lake 2.3.0
Change data feed1Delta Lake 2.0.0
Columns mapping2Delta Lake 1.2.0
Deletion vectors2Delta Lake 2.3.0
Timestamp w/o
Timezone
3Delta Lake 2.4.0
V2 checkpoints3Delta Lake 3.0.0
Iceberg compatibility V12Delta Lake 3.0.0


Support for conlumn constraints

This section explains Relyt's support for column constraints. While these features are writer features, they remain read-transparent.


CHECK constraints

  • Minimum required Delta Lake version: 0.8.0

  • Feature name: checkConstraints

  • Read support by Relyt: ✅

  • Write support by Spark: ✅, by using

    ALTER TABLE delta.`{basePath}` ADD CONSTRAINT const1 CHECK (c_check > -2)

    ALTER TABLE delta.`{basePath}` DROP CONSTRAINT const1

Generated columns

  • Minimum required Delta Lake version: 1.0.0

  • Feature name: generatedColumns

  • Read support by Relyt: ✅

  • Write support by Spark: ✅, by using

    .addColumn("c_generated_name", "STRING", generatedAlwaysAs="concat(c_first_name, ' ', c_last_name)")

Default columns

  • Minimum required Delta Lake version: 3.1.0

  • Feature name: allowColumnDefaults

  • Read support by Relyt: ✅

  • Write support by Spark: ✅, by using

    TBLPROPERTIES ('delta.feature.allowColumnDefaults' = 'enabled')
    ALTER TABLE test_col_constraint.col_constraint_base SET TBLPROPERTIES (
    'delta.minReaderVersion' = '1',
    'delta.minWriterVersion' = '7',
    'delta.feature.allowColumnDefaults' = 'enabled'
    );

    ALTER TABLE {schemaName}.{tableName} ALTER COLUMN c_date SET DEFAULT CURRENT_TIMESTAMP();

    ALTER TABLE {schemaName}.{tableName} ALTER COLUMN c_expr SET DEFAULT (3+5);

    ALTER TABLE {schemaName}.{tableName} ALTER COLUMN c_default SET DEFAULT 417;

    ('test_default_null_2', DEFAULT, DEFAULT, DEFAULT)

Identity columns

  • Minimum required Delta Lake version: 3.1.0

  • Feature name: identityColumns

  • Read support by Relyt: ❌

  • Write support by Spark: Not supported in open-source Spark


Support for schema evolution

Relyt supports reading tables with schema changes made via overwriteSchema. It also supports schema evolution using mergeSchema and Delta Lake’s column mapping features, including schema merging and column mapping.

For more information, see Schema Evolution.