Advanced SQL
Overview
Structured Query Language is an extremely powerful way to extract and process data from your business database. To help you write powerful queries, here are a few tricks of the trade for various scenarios you might encounter while using Chartio.
This is a way to make a category out of what is normally a range of values. Suppose you have a table of sales transactions, and you want to count the number of sales in certain ranges to get output like so:
| Sale Amount Range | Number of Sales | |||
|---|---|---|---|---|
| 1 | John | 25 | 70000 | 2 | Jill | 24 | 80000 |
To produce these kinds of results with a SQL query, we can employ the CASE statement, a kind of conditional. For example, let's say we wanted to get a count of our users by age. But rather than get the count for each possible age from 1-99, we decide to bin them in deciles. The query would look like this:
SELECT
CASE
WHEN age > 10 AND age < 20 THEN 'teen'
WHEN age >= 20 AND age < 30 THEN 'twenty something'
WHEN age >= 30 AND age < 40 THEN 'thirty something'
WHEN age >= 40 AND age < 50 THEN 'forty something'
ELSE '50 or more'
END AS realage,
COUNT(*)
FROM
users
GROUP BY
realage;
The CASE syntax works likes this:
CASE
WHEN condition THEN 'bin name'
...
(optional) ELSE (this covers anything not explicitly mentioned in the bins above)
END AS column alias
Often, the contents for the various bins needs to be extracted via a subquery before they can be called with the CASE statement. In this example, we want to bin customer orders. As a quick note, a subquery is a SELECT statement nested within a SELECT statement.
SELECT
CASE
WHEN order_count > 10 AND order_count < 20 THEN '10-19'
WHEN order_count >= 20 AND order_count < 30 THEN '20-29'
WHEN order_count >= 30 AND order_count < 40 THEN '40-49'
WHEN order_count >= 40 AND order_count < 50 THEN '50-59'
ELSE '60 or greater'
END AS orderbins,
COUNT(1)
FROM
(SELECT COUNT(orders_order) as order_count
FROM Orders GROUP BY Orders.customer_id) as subquery
GROUP BY ordersbins;
Continuing with our previous example, rather than self-defining the order bins, we can create a histogram with equal intervals across the range of data. We do this by identifying the interval we'd like (in this example, we use 10), casting that result as a CHAR and concatenating it with the '<' sign.
SELECT CONCAT('< ', CAST((order_count DIV 10 + 1)*10 as CHAR)) as x,
COUNT(1)
FROM
(SELECT COUNT(orders_order) as order_count
FROM Orders GROUP BY Orders.customer_id) as subquery
GROUP BY x
ORDER BY order_count;
Suppose we have an orders table:
| Date | Qty | |
|---|---|---|
| 2011-01-01 | 5 | |
| 2011-01-02 | 20 | |
| 2011-01-03 | 15 | |
| 2011-01-04 | 8 | |
| 2011-01-05 | 4 | |
| 2011-01-06 | 2 | |
| 2011-01-07 | 33 |
and we'd like to visualize a moving average of order count. For our average, we'd like to use the last 5 days (on a rolling basis), so we employ the DATEDIFF function.
SELECT t1.date_ordered,
(SELECT SUM(t2.qty) / COUNT(t2.qty)
FROM t AS t2
WHERE DATEDIFF(t1.date_ordered, t2.date_ordered) BETWEEN 0 AND 4) AS '5dayMovingAvg'
FROM t AS t1
ORDER BY t1.dt;
