Using the Spark SDK to run data generation

To use the Structural SDK to de-identify data in a Spark SDK workspace, you run a Spark program.

For details about the SDK, go to the Tonic SDK Javadoc.

Here is a very basic example of using the SDK to run data generation on a workspace and write the output to a DataFrame:

// Sets a statistics seed for the data generation
val baseStatisticsSeed = 489465;

// Identifies the workspace and provides the API token
val workspace = Workspace.createWorkspace("https://path/to/tonic", "<<api-token>>", "<<workspace-id>>", baseStatisticsSeed);

// Retrieves the source data
val sourceDf = spark.read.parquet("s3://parquet/source/users")

// Defines the output in Spark
val processedDf = workspace.processDataframe("users", sourceDf);

Last updated

Was this helpful?