Amazon Athena

  1. Go the the Amazon Athena console. The navigation section on the left shows tables, databases, and optionally data catalogs. Select the Glue catalog (if you see this dropdown), select comprehendgluedatabase-... as a database. In this database there is a table defined already (as a result of the AWS CloudFormation template you launched earlier in this chapter) called sentiment_analysis. pipeline.athena-01.png

  2. In the main section of the screen you can now enter SQL queries to work on your data. Remember, the data we’re talking about are the results of your Amazon Comprehend jobs that were stored in your Amazon S3 bucket. Just give it a try with retrieving all content from your virtual table - you can copy the SQL query from the code block below and paste it into the Athena console. Then click the Run query button. Your query results will appear in the Results area below the query editor. Take a moment to explore the query results.

    select * from sentiment_analysis;

    pipeline.athena-02.png

  3. Now click the arrowhead directly left of your table name in the navigation section on the left of the screen. You will see a drop down list of the column names and types of your virtual table. pipeline.athena-03.png

  4. Based on what you have seen so far about the definition and the content of your table, you can formulate a more sophisticated SQL query with projection and selection. Let’s retrieve the sentiment data only for our customer transcriptions - you can copy the query from the code block below.

    select sentiment from sentiment_analysis where talker = 'Customer';

    pipeline.athena-04.png

  5. Let’s explore the Amazon Athena console a bit more. In the lower left quadrant of the screen you can see an empty list of views. Views are a well-known concept in relational databases (a pre-defined query kept in the database dictionary that users can treat like a table). Click Create view and you’ll see a sample view definition in the query editor.

    Try to create a view yourself that retrieves the sentiment values for customer transcripts only and then fire an SQL query to your view.

    pipeline.athena-05.png

  6. Now click the AWS Glue Data Catalog link at the top of the Amazon Athena console. You will land on the AWS Glue console and you will see (at least) the table that you’ve just been using from Amazon Athena. pipeline.athena-06.png

  7. Click the Databases link in the navigation area on the left to see an overview of your AWS Glue databases. There may be other databases in the list, for instance one that is called default and one that is called sampledb. pipeline.athena-07.png

  8. Click your database name and then on your table name to get to the details view for your table. In the Location field you can see that the tables is based on objects in your Amazon S3 bucket whose object keys start with comprehend/. pipeline.athena-08.png

  9. When we go back to the Amazon Athena console, you can retrieve the current amount of tuples in your table using the following query.

    select count(*) from sentiment_analysis;

    If you like it to be auto-formatted, you can press the Format query button at the bottom right of the query editor. That will lead you to the following format:

    SELECT count(*)
    FROM sentiment_analysis;

    You can verify that, after a new call recording is added, the amount of tuples in your table will be +2.

    If you want you can download the two analysis results from a particular call from your Amazon S3 bucket, change the file name locally to reflect a random new caller ID, and change the caller ID within the JSON document as well to the same new one. When you upload the changed JSON documents to your Amazon S3 bucket into the virtual folder comprehend, you can verify that the amount of tuples in your query will be +2 compared to your previous query result.

    pipeline.athena-09.png