Creating and editing pipelines

Creating a pipeline

To create a pipeline, either:

  • On the Pipelines page, click Create a New Pipeline.

  • On the Home page, click Create, then click Pipeline.

Setting the pipeline name and source type

On the Create A New Pipeline panel:

  1. In the Name field, type the name of the pipeline.

  2. Under Files Source, select the location of the source files.

    • To upload files from a local file system, click File upload, then click Save.

    • To select files from and write output to Amazon S3, click Amazon S3.

    • To select files from and write output to Databricks, click Databricks.

Providing Amazon S3 credentials

If you selected Amazon S3, provide the credentials to use to connect to Amazon S3.

  1. In the Access Key field, provide an AWS access key that is associated with an IAM user or role. For an example role that has the required permissions for an Amazon S3 pipeline, go to Example IAM role for Amazon S3 pipelines.

  2. In the Access Secret field, provide the secret key that is associated with the access key.

  3. From the Region dropdown list, select the AWS Region to send the authentication request to.

  4. In the Session Token field, provide the session token to use for the authentication request.

  5. To test the credentials, click Test AWS Connection.

  6. Click Save.

  7. On the pipeline Settings page, provide the rest of the pipeline configuration. For more information, go to Configuring an Amazon S3 pipeline.

  8. Click Save.

Providing Databricks connection information

If you selected Databricks, provide the connection information:

  1. In the Databricks URL field, provide the URL to the Databricks workspace.

  2. In the Access Token field, provide the access token to use to get access to the volume.

  3. To test the connection, click Test Databricks Connection.

  4. Click Save.

  5. On the pipeline Settings page, provide the rest of the pipeline configuration. For more information, go to Configuring a Databricks pipeline.

  6. Click Save.

Editing a pipeline

To update a pipeline configuration:

  1. Either:

    • On the Pipelines page, click the pipeline options menu, then click Settings.

    • On the pipeline details page, click the settings icon. For Amazon S3 and Databricks pipelines, the settings icon is next to the Run Pipeline option. For uploaded file pipelines, the settings icon is next to the Upload Files option.

  2. On the Pipeline Settings page, update the configuration. For all pipelines, you can change the pipeline name, and whether to also created redacted versions of the original files. For Amazon S3 and Databricks pipelines, you can change the file selection. For more information, go to Configuring an Amazon S3 pipeline. For uploaded file pipelines, you do not manage files from the Settings page. For information about uploading files, go to Selecting files for an uploaded file pipeline.

  3. Click Save.

Deleting a pipeline

To delete a pipeline, on the Pipeline Settings page, click Delete Pipeline.

Last updated