Structural differences and limitations with Amazon EMR

Required license: Professional or Enterprise

Not available on Structural Cloud.

No workspace inheritance

Amazon EMR workspaces do not support workspace inheritance.

Table mode limitations

You can only assign the De-Identify or Truncate table modes.

For Truncate mode, the table is ignored completely. The table does not exist in the destination database.

Generator limitations

Amazon EMR workspaces cannot use the following generators:

  • Algebraic

  • Array Character Scramble

  • Array JSON Mask

  • Array Regex Mask

  • Cross-Table Sum

  • CSV Mask

  • Event Timestamps

  • HTML Mask

  • JSON Mask

  • SIN

The following generators are supported, but with restrictions:

  • Character Scramble is only supported for text columns.

  • Timestamp Shift is only supported on date column types.

No subsetting, but support for table filtering

Amazon EMR workspaces do not support subsetting.

However, for tables that use the De-Identify table mode, you can provide a WHERE clause to filter the table. For details, to go Using table filtering for data warehouses and Spark-based data connectors.

No upsert

Amazon EMR workspaces do not support upsert.

No output to container artifacts

For Amazon EMR workspaces, you cannot write the destination data to container artifacts.

No output to an Ephemeral snapshot

For Amazon EMR workspaces, you cannot write the destination data to an Ephemeral snapshot.

Limited job logs

The logging of Spark jobs on the job details page is more limited than it is for other data connectors. This is because of how Spark clusters are distributed and managed.

The Jobs view provides information about the job's status as it runs.

After the job starts, it provides a tracking URL. The tracking URL leads to the Spark management portal, where you can find additional, more detailed logs.

Last updated