Using the Spark SDK to run data generation
To use the Structural SDK to de-identify data in a Spark SDK workspace, you run a Spark program.
For details about the SDK, go to the Tonic SDK Javadoc.
Here is a very basic example of using the SDK to run data generation on a workspace and write the output to a DataFrame:
// Sets a statistics seed for the data generation
val baseStatisticsSeed = 489465;
// Identifies the workspace and provides the API token
val workspace = Workspace.createWorkspace("https://path/to/tonic", "<<api-token>>", "<<workspace-id>>", baseStatisticsSeed);
// Retrieves the source data
val sourceDf = spark.read.parquet("s3://parquet/source/users")
// Defines the output in Spark
val processedDf = workspace.processDataframe("users", sourceDf);
Last updated
Was this helpful?