Business Intelligence Buzzwords, Explained

There's lots of jargon in the Business Intelligence world. Below, you'll find a quick no-BS glossary of the most common concepts in the industry.

  1. Business Intelligence
  2. Data Warehousing
  3. ETL
  4. Data Marts
  5. OLAP
  6. ROLAP
  7. MOLAP
  8. HOLAP
  9. Aggregation
  10. Drill-Down
  11. Cubes
  12. In-Memory Analytics
  13. Ad Hoc Queries
  14. Columnar Databases
  15. Balanced Scorecard

Business Intelligence

Business intelligence is a general term to describe applying data analysis results to make business decisions.

Data Warehousing

A Data Warehouse is a database used for business analytics. Data warehouses pull in information from across a business and store it in a way that is optimized for analytical queries through a process known as Extract, Transform and Load (ETL), a confusing concept which we explain immediately below.

ETL ( Extraction, Transformation and Loading)

This refers to the three step process of creating a Data Warehouse.

Extraction -- pulling in data from across the business

Transformation -- manipulating the data according to certain rules and then loading it into a warehouse for analytics. This manipulation usually involves cleaning up messy data, creating derived metrics (e.g., sale_amount is a product of quantity * unit_price), joining related data and aggregating data (e.g., total sales across all stores or by region).

Loading -- loading this manipulated data into the Data Warehouse.

etl



Data Marts

Data marts are subsections of a data warehouse often used to organize the data into different sections for each business line, team or department.

Data Mart



OLAP (Online Analytical Processing)

OLAP is an often-overused term that describes an approach to organizing business data so that it's optimized for analytical queries. The point of OLAP is to organize data so that it can be seen from different angles. To achieve this, OLAP supports data aggregation, drill-down and slice & dice capabilities (these are defined below). Over the years, many strangely-named variations on OLAP have appeared, including: ROLAP, MOLAP and HOLAP (see below).

ROLAP

ROLAP (Relational On-line Analytical Processing) simply refers to analytics through the use of SQL queries against a relational database. This is in contrast to MOLAP (see below), which frequently requires that data be pre-processesed and loaded into a separate database for analytics.

MOLAP

MOLAP (Multi-dimensional On-line Analytical Processing) is a subcategory of OLAP that involves pre-processing business data into cubes so that it is optimized for analytics.

HOLAP

HOLAP (Hybrid On-line Analytical Processing) is a combination of ROLAP and MOLAP that attempts to leverage the advantages of both approaches.

Aggregation

Aggregation is a feature of OLAP that is equivalent to the SQL aggregation functions, such as SUM, COUNT, AVG, etc. For example, we could aggregate the value of all sales into a single number, total sales, by using a SUM aggregation.

Drill-Down

Drill-down is a feature of OLAP that lets business users explore data on increasingly higher levels of detail. This is equivalent to the use of WHERE clauses in SQL statements. For example, given total sales for the year, we could drill down to see sales by store or given sales by quarter, we could drill down into sales by month.

Cubes

Cubes are the building blocks of OLAP. They are a data structure in which business data, for example, sales data, are pre-aggregated along various dimensions. These dimensions might be Quarter, State, Employee, State, etc. The advantage of using cubes is that because the data is effectively pre-processed, the query speeds are faster. So if an analyst wants to see total sales by quarter or by employee, the use of a cube will speed up the time to an answer. The disadvantage is that it takes a good deal of time to create and doesn't perform as well on very large datasets.

olap

In-Memory Analytics

In-Memory BI has been one of the more prevalent buzzwords floating around recently. One of the bottlenecks in BI applications is slow disk or database access, which is significantly slower than RAM. One of the benefits of in-memory applications is that on-the-fly calculations are faster, and it typically can obviate a lot of time and effort that goes into optimizing data structures for a disk-based approach.

While it’s being marketed as a new innovation, in-memory BI has been around for a long time. It’s recently taken off because of the falling cost of RAM, which makes it much more economical to rely on an in-memory approach. That said, an in-memory approach does not necessary trump a disk-based approach as disk-based products can take advantage of RAM-based disk caches and in-memory products can take advantage of disk-based virtual memory.

Columnar Databases

The Columnar database (unlike its row-based counterparts) represents another class of technologies considered emerging in the BI space, though columnar stores have been around for some time. The following example illustrates the difference between row and column-oriented stores.

ID Name Age Salary
1 John 33 70000
2 Jill 30 80000

A row-oriented store will serialize the data row-wise:

1, John Smith, 25, 50000
2, Jane Fowler, 33, 53000

Whereas a column-oriented store will serialize the data by its columns:

1,2
John Smith, Jane Fowler
25, 33
50000, 53000

Roughly speaking, columnar stores are better for computing aggregates (averages, standard deviation, e.g.,) over many rows of data that only include a smaller subset of columns. In other words, if the above example also includes the names of John and Jane’s family member, their social security number and their home address, a column oriented store would skip reading those values, whereas a row-oriented store query would be forced to, taking longer.

Ad Hoc Query

The ability to ask business questions on the fly. BI tools typically have pre-prepared reports, usually curated by the IT department. An ad hoc report allows a business user to ask a custom question (query) outside the scope of those pre-prepared reports. Because these queries can put strain on a company’s database, they are typically run against a data warehouse designed especially for analytical purposes.

Balanced Scorecard

The balanced scorecard is essentially a map of metrics and targets for a business to monitor and achieve. It distills the most important metrics for a business, assigns various targets to meet on those metrics and alerts executives when they fall behind. Example measures include customer satisfaction as well as various forms of financial performance.