Structural differences and limitations with Amazon EMR
Required license: Professional or Enterprise
Not available on Structural Cloud.
No workspace inheritance
Amazon EMR workspaces do not support workspace inheritance.
Table mode limitations
You can only assign the De-Identify or Truncate table modes.
For Truncate mode, the table is ignored completely. The table does not exist in the destination database.
Generator limitations
Amazon EMR workspaces cannot use the following generators:
Algebraic
Array Character Scramble
Array JSON Mask
Array Regex Mask
Cross-Table Sum
CSV Mask
Event Timestamps
HTML Mask
JSON Mask
SIN
The following generators are supported, but with restrictions:
Character Scramble is only supported for text columns.
Timestamp Shift is only supported on date column types.
No subsetting, but support for table filtering
Amazon EMR workspaces do not support subsetting.
However, for tables that use the De-Identify table mode, you can provide a WHERE
clause to filter the table. For details, to go Using table filtering for data warehouses and Spark-based data connectors.
No upsert
Amazon EMR workspaces do not support upsert.
No output to a container repository
For Amazon EMR workspaces, you cannot write the destination data to a container repository.
No output to an Ephemeral snapshot
For Amazon EMR workspaces, you cannot write the destination data to an Ephemeral snapshot.
Limited job logs
The logging of Spark jobs on the job details page is more limited than it is for other data connectors. This is because of how Spark clusters are distributed and managed.
The Jobs view provides information about the job's status as it runs.
After the job starts, it provides a tracking URL. The tracking URL leads to the Spark management portal, where you can find additional, more detailed logs.
Last updated