Let’s have once more a look at the architecture of our processing pipeline. We have split up the pipeline into two pictures to have it better visualized.
Amazon Connect drops call recording and CTR records into Amazon S3. We will reuse our existing Amazon Connect instance and new call recordings will be replicated into our data platform for ingestion.
When we launch the script to create our processing pipeline, it will create a new S3 bucket that will become our new replication target. It makes it much simpler to assume a vanilla S3 bucket in the script than probing on all possible conditions that might exist from your previous experiments with your first replication bucket.
Once the ingestion bucket receives a replicated call recording, the respective S3 event triggers an AWS Lambda function to split that call recording into two seperate audio file - one for each stereo channel, which relates to the recording of the agent and the customer, respectively. The Lambda function stores both split audio files in the same S3 bucket, but the object names contain different virtual folders.
Storing each of these split audio files will again lead to an S3 event that now triggers a different Lambda function. This Lambda function’s duty is invoke an AWS Step Functions state machine that will cater for the speech-to-text transcriptions of our split audio files.
The Step Functions state machine orchestrates three Lambda functions that use the Amazon Transcribe API for executing the transcription: