Structural differences and limitations with Amazon EMR

Required license: Professional or Enterprise

Not available on Structural Cloud.

No workspace inheritance

Amazon EMR workspaces do not support workspace inheritance.

Table mode limitations

You can only assign the De-Identify or Truncate table modes.

For Truncate mode, the table is ignored completely. The table does not exist in the destination database.

Generator limitations

Amazon EMR workspaces cannot use the following generators:

Algebraic
Array Character Scramble
Array JSON Mask
Array Regex Mask
Cross-Table Sum
CSV Mask
Event Timestamps
HTML Mask
JSON Mask
SIN

The following generators are supported, but with restrictions:

Character Scramble is only supported for text columns.
Timestamp Shift is only supported on date column types.

No subsetting, but support for table filtering

Amazon EMR workspaces do not support subsetting.

However, for tables that use the De-Identify table mode, you can provide a WHERE clause to filter the table. For details, to go Using table filtering for data warehouses and Spark-based data connectors.

No upsert

Amazon EMR workspaces do not support upsert.

No output to a container repository

For Amazon EMR workspaces, you cannot write the destination data to a container repository.

No output to an Ephemeral snapshot

For Amazon EMR workspaces, you cannot write the destination data to an Ephemeral snapshot.

Limited job logs

The logging of Spark jobs on the job details page is more limited than it is for other data connectors. This is because of how Spark clusters are distributed and managed.

The Jobs view provides information about the job's status as it runs.

After the job starts, it provides a tracking URL. The tracking URL leads to the Spark management portal, where you can find additional, more detailed logs.

Last updated 7 months ago

Was this helpful?