Advanced SQL

Overview

Structured Query Language is an extremely powerful way to extract and process data from your business database. To help you write powerful queries, here are a few tricks of the trade for various scenarios you might encounter while using Chartio.

  1. Binning your Data
  2. Binning with a Subquery
  3. Histogram
  4. Simple Moving Average

Binning your Data

This is a way to make a category out of what is normally a range of values. Suppose you have a table of sales transactions, and you want to count the number of sales in certain ranges to get output like so:

Sale Amount Range Number of Sales
1 John 25 70000
2 Jill 24 80000

To produce these kinds of results with a SQL query, we can employ the CASE statement, a kind of conditional. For example, let's say we wanted to get a count of our users by age. But rather than get the count for each possible age from 1-99, we decide to bin them in deciles. The query would look like this:

SELECT 
    CASE 
        WHEN age > 10 AND age < 20 THEN 'teen'
        WHEN age >= 20 AND age < 30 THEN 'twenty something'
        WHEN age >= 30 AND age < 40 THEN 'thirty something'
        WHEN age >= 40 AND age < 50 THEN 'forty something'
        ELSE '50 or more' 
    END AS realage, 
    COUNT(*)
FROM
    users
GROUP BY
    realage;

The CASE syntax works likes this:

CASE
WHEN condition THEN 'bin name'
...
(optional) ELSE (this covers anything not explicitly mentioned in the bins above)
END AS column alias


Binning with a Subquery

Often, the contents for the various bins needs to be extracted via a subquery before they can be called with the CASE statement. In this example, we want to bin customer orders. As a quick note, a subquery is a SELECT statement nested within a SELECT statement.

 SELECT 
      CASE
             WHEN order_count > 10 AND order_count < 20 THEN '10-19'
             WHEN order_count >= 20 AND order_count < 30 THEN '20-29'
             WHEN order_count >= 30 AND order_count < 40 THEN '40-49'
             WHEN order_count >= 40 AND order_count < 50 THEN '50-59'
        ELSE '60 or greater' 
    END AS orderbins, 
COUNT(1) 
FROM
   (SELECT COUNT(orders_order) as order_count 
   FROM Orders GROUP BY Orders.customer_id) as subquery
GROUP BY ordersbins;


Histograms

Continuing with our previous example, rather than self-defining the order bins, we can create a histogram with equal intervals across the range of data. We do this by identifying the interval we'd like (in this example, we use 10), casting that result as a CHAR and concatenating it with the '<' sign.

SELECT CONCAT('< ', CAST((order_count DIV 10 + 1)*10 as CHAR)) as x, 
COUNT(1) 
FROM
   (SELECT COUNT(orders_order) as order_count 
   FROM Orders GROUP BY Orders.customer_id) as subquery
GROUP BY x 
ORDER BY order_count;


Simple Moving Average (SMA)

Suppose we have an orders table:

Date Qty
2011-01-01 5
2011-01-02 20
2011-01-03 15
2011-01-04 8
2011-01-05 4
2011-01-06 2
2011-01-07 33

and we'd like to visualize a moving average of order count. For our average, we'd like to use the last 5 days (on a rolling basis), so we employ the DATEDIFF function.

SELECT t1.date_ordered, 
     (SELECT SUM(t2.qty) / COUNT(t2.qty)
      FROM t AS t2
     WHERE DATEDIFF(t1.date_ordered, t2.date_ordered) BETWEEN 0 AND 4) AS '5dayMovingAvg'
FROM t AS t1
ORDER BY t1.dt;