Parse individual files

You can use the Textual SDK to parse individual files, either from a local file system or from an S3 bucket.

Textual returns a FileParseResult object for each parsed file. The FileParseResult object is a wrapper around the output JSON for the processed file.

Parse a file from a local file system

To parse a single file from a local file system, use textual.parse_file:

with open('<path to the file>','rb') as f: 
    byte_data = f.read()
    parsed_doc = textual.parse_file(byte_data, '<file name>')

You must use rb access mode to read the file. rb access mode opens the file to be read in binary format.

You can also set a timeout in seconds for the parsing. You can add the timeout as a parameter of parse_file command. To set a timeout to use for all parsing, set the environment variable TONIC_TEXTUAL_PARSE_TIMEOUT_IN_SECONDS.

Parse a file from an S3 bucket

You can also parse files that are stored in Amazon S3. Because this process uses the boto3 library to fetch the file from Amazon S3, you must first set up the correct AWS credentials.

To parse a file from an S3 bucket, use textual.parse_s3_file:

parsed_doc = textual.parse_s3_file('<bucket>','<key>')

Last updated