Architecture Overview

Let’s have once more a look at the architecture of our processing pipeline. We have split up the pipeline into two pictures to have it better visualized.

Part 1

  1. Amazon Connect drops call recording and CTR records into Amazon S3. We will reuse our existing Amazon Connect instance and new call recordings will be replicated into our data platform for ingestion.

    When we launch the script to create our processing pipeline, it will create a new S3 bucket that will become our new replication target. It makes it much simpler to assume a vanilla S3 bucket in the script than probing on all possible conditions that might exist from your previous experiments with your first replication bucket.

  2. Once the ingestion bucket receives a replicated call recording, the respective S3 event triggers an AWS Lambda function to split that call recording into two seperate audio file - one for each stereo channel, which relates to the recording of the agent and the customer, respectively. The Lambda function stores both split audio files in the same S3 bucket, but the object names contain different virtual folders.

  3. Storing each of these split audio files will again lead to an S3 event that now triggers a different Lambda function. This Lambda function’s duty is invoke an AWS Step Functions state machine that will cater for the speech-to-text transcriptions of our split audio files.

  4. The Step Functions state machine orchestrates three Lambda functions that use the Amazon Transcribe API for executing the transcription:

    • Submit the transcription job to Amazon Transcribe.
    • Check the current status of the transcription job. AWS Step Functions repeats this step until the transcription job is done.
    • Store the result of the transcription job in Amazon S3.

pipeline.architecture.01.png

Part 2

  1. Storing the transcription text in Amazon S3 leads again to an S3 event that triggers a Lambda function that creates an Amazon Comprehend job and stores the results (entities, sentiment, key phrases, detected language) again in Amazon S3.
  2. AWS Glue is used to maintain a data catalog with a virtual table for our Amazon Comprehend results. With Amazon Athena we can use SQL queries to retrieve data from that virtual table.
  3. Towards the end of this lab we will furthermore use Amazon Quicksight to visualize the findings of our Amazon Comprehend jobs.

pipeline.architecture.02.png