Skip to main content

Data Type Mappings

Data type mappings are crucial when migrating or processing data in external tables, as they ensure compatibility between different systems and help maintain data integrity. By accurately matching source and target data types, these mappings prevent data loss or corruption and enhance query performance through efficient processing.


Precautions

  • Only Extreme DPS clusters can be used to run queries on Delta Lake tables.

  • Relyt does NOT support data writes to Delta Lake tables; therefore, data type mappings are applicable only for reading data from these tables.

  • Relyt supports the following data types for partitioned columns in Delta Lake tables: bool, byte, short, integer, long, float, double, decimal, timestamp, and date.


Data type mappings

The following table describes the mappings between Relyt and Delta Lake data types.

Delta LakeRelyt
booleanboolean
bytetinyint
shortsmallint
integerint
longbigint
integerint
longbigint
floatreal
doubledouble
decimaldecimal (..., ...)
stringvarchar
datedate
timestamptimestamp with timezone
timestamp without timezonetimestamp
binaryN/A
arrayN/A
mapN/A
structN/A


Usage notes

When handling timezone-based timestamps, ambiguities can arise for dates before October 15, 1582, due to different calendrical rules:

  • Julian Calendar: Leap years occur every four years.

  • Proleptic Gregorian Calendar: Leap years occur every four years, but a year must be divisible by 400 to be considered a leap year if it is a century.

Spark Version 2.x

Mixed Calendar: The Julian Calendar is used before October 12, 1582, while the Gregorian Calendar is applied thereafter.

Spark Version 3.x

  • Exception Handling: An exception is thrown when writing values prior to October 15, 1582.

  • Explicit Write Mode Specification:

    • legacy: Utilizes the Julian Calendar.

    • corrected: Employs the modified Proleptic Gregorian Calendar.

Relyt Extreme DPS

  • Calendar Utilization: Relyt Extreme DPS uses the Proleptic Gregorian Calendar for managing timestamp values.

  • Recommended Configuration: For accurate reading of timestamp types, it is advisable to use Spark version 3.x and enable int96RebaseModeInWrite in corrected mode.