Query and visualize data in S3 - Athena & Quicksight

December 1, 2016

AWS has released many new services this week. I was mostly interested of Athena and decided to take quick peek into the functionality.

So this is the initial screen you have in Athena. On the left are your databases and tables, on the right your query and its results.

post2_1

First you need to add some datasources. In this example I uploaded Wikipedia posts data provided by Indeed.com (http://opensource.indeedeng.io/imhotep/docs/sample-data/) into S3 for querying.

First you need to provide table name and location of data in S3.

post2_2

Next you need data format. In this case it was in TSV format.

post2_3

Next up is to define the column names and types.

post2_4

After skipping the partitioning the table is actually created. You’ll see following query running for a while depending on the file size.

post2_5

When the table is ready it can be queried with normal SQL. Here is an example of finding documents in which title begins with ‘Edel’. You’ll notice that the query is pretty fast and notifies the amount of data scanned. The data scanned is the basis for the pricing and in this case the cost was $5/TB * 0.00002665 TB = $0.00013325 for a single query.

post2_6

In order to make things a bit more interesting I wanted to see if the results could be visualized in QuickSight. It seems that Athena by default stores all the results of the queries in S3 on a day level folder structure.

post2_8

After this finding I created some summary analysis and did some grouping to the data.

post2_7

Next we hop into QuickSight side of things.

In QuickSight you once again start by adding new data set.

post2_9

Here you have an option to use data from S3 in addition to many other supported source systems.

post2_10

So I decided to add the results generated by Athena in S3 as a data set for QuickSight. Here is the sample manifest file I created and added in QuickSight. The manifest file has just list of the csv files in S3 to be included in the data set.

{
“fileLocations”: [
{“URIs”: [“https://s3.amazonaws.com/aws-athena-query-results-630692836673-us-east-1/Unsaved/2016/12/01/f8cfbc88-9b66-4477-9e41-04f46e42ed52.csv“
]}
],
“globalUploadSettings”: {
“textqualifier”: “\””
}
}

post2_11

Again after a while you’ll notice that the data set was created and be used as a basis for analysis in QuickSight.

post2_12

Next up is only to visualize the data and that can be achieved easily by just selecting the columns from the field list and voila! There is the data queried by Athena from compressed TSV file in S3 visualized by QuickSight.