Creating and editing pipelines

Creating a pipeline

To create a pipeline, on the Pipelines page, click Create a New Pipeline.

Setting the pipeline name and source type

On the Create A New Pipeline panel:

Pipeline creation panel
  1. In the Name field, type the name of the pipeline.

  2. Under Files Source, select the location of the source files.

    • To upload files from a local file system, click File upload, then click Save.

    • To select files from and write output to Amazon S3, click Amazon S3.

    • To select files from and write output to Databricks, click Databricks.

    • To select files from and write output to Azure Blob Storage, click Azure.

Providing Amazon S3 credentials

If you selected Amazon S3, provide the credentials to use to connect to Amazon S3.

Pipeline creation panel for Amazon S3
  1. In the Access Key field, provide an AWS access key that is associated with an IAM user or role. For an example of a role that has the required permissions for an Amazon S3 pipeline, go to Example IAM role for Amazon S3 pipelines.

  2. In the Access Secret field, provide the secret key that is associated with the access key.

  3. From the Region dropdown list, select the AWS Region to send the authentication request to.

  4. In the Session Token field, provide the session token to use for the authentication request.

  5. To test the credentials, click Test AWS Connection.

  6. Click Save.

  7. On the Pipeline Settings page, provide the rest of the pipeline configuration. For more information, go to Configuring an Amazon S3 pipeline.

  8. Click Save.

Providing Databricks connection information

If you selected Databricks, provide the connection information:

  1. In the Databricks URL field, provide the URL to the Databricks workspace.

  2. In the Access Token field, provide the access token to use to get access to the volume.

  3. To test the connection, click Test Databricks Connection.

  4. Click Save.

  5. On the Pipeline Settings page, provide the rest of the pipeline configuration. For more information, go to Configuring a Databricks pipeline.

  6. Click Save.

Providing Azure credentials

If you selected Azure, provide the connection information:

  1. In the Account Name field, provide the name of your Azure account.

  2. In the Account Key field, provide the access key for your Azure account.

  3. To test the connection, click Test Azure Connection.

  4. Click Save.

  5. On the Pipeline Settings page, provide the rest of the pipeline configuration. For more information, go to Configuring an Azure pipeline.

  6. Click Save.

Editing a pipeline

To update a pipeline configuration:

  1. Either:

    • On the Pipelines page, click the pipeline options menu, then click Settings.

    • On the pipeline details page, click the settings icon. For cloud storage pipelines, the settings icon is next to the Run Pipeline option. For uploaded file pipelines, the settings icon is next to the Upload Files option.

  2. On the Pipeline Settings page, update the configuration. For all pipelines, you can change the pipeline name, and whether to also create redacted versions of the original files. For cloud storage pipelines, you can change the file selection. For more information, go to Configuring an Amazon S3 pipeline, Configuring a Databricks pipeline, or Configuring an Azure pipeline. For uploaded file pipelines, you do not manage files from the Pipeline Settings page. For information about uploading files, go to Selecting files for an uploaded file pipeline.

  3. Click Save.

Deleting a pipeline

To delete a pipeline, on the Pipeline Settings page, click Delete Pipeline.

Delete Pipeline button for a pipeline

Last updated