The Data Analysis Expressions (DAX) language includes a wide range of functions that help you refine your queries when retrieving data from a SQL Server Analysis Services (SSAS) tabular database. You’ve seen numerous examples of DAX functions throughout this series on the tabular model. One set of functions that are particularly useful when working with tabular data are the statistical functions, which support various ways to aggregate and analyze data. Statistical functions include not only the typical standbys, such as Min
and Max
, but also those that help structure that data into tables, such as AddColumns
and Summarize
.
In this article, the fifth in our series, we explore many of the DAX statistical functions and provide numerous examples that demonstrate how they work. As with the previous articles (one | two | three | four), the examples retrieve data from the AdventureWorks Tabular Model SQL 2012
tabular database, available as a SQL Server Data Tools (SSDT) tabular project from the AdventureWorks CodePlex site. I implemented the database on a local instance of SSAS 2012 in tabular mode and developed the examples against that database.
To follow along in this article, you should have a basic understanding of how to write a DAX query. The first two articles provide a good starting point if you’re new to DAX and SSAS tabular databases. Once you’ve read through those, you should have no problem applying the statistical functions we cover here.
Given the analytical nature of tabular databases, it’s not surprising that the statistical functions play such a key role; learning them is well worth the effort.
The Row Function
Often when retrieving and aggregating tabular data, you’ll want to return only a single row in order to get a quick overview of your data. That’s where the Row
function comes in. It returns a single-row table made up of one or more columns. You can use the function in conjunction with other functions in order to view different types of information. For example, the following query uses the Sum
and Format
functions in conjunction with the Row
function in order to return the total amount of sales from the Sales Amount
column in the Internet Sales
table:
1 2 3 4 5 6 7 |
evaluate ( row ( "Total Sales", format(sum('Internet Sales'[Sales Amount]), "Currency") ) ) |
As you’ve seen in the previous articles, you must use the Evaluate
statement to retrieve data from an SSAS tabular database. The statement takes one argument: a table or table expression. In this example, we use the Row
function to create the table expression.
For each column you want the Row
function to return, you must specify the column name in double quotes followed by an expression that defines the value to be returned, with a comma separating the two components. In the example above, the Row
function returns only one column named Total Sales
, and the column expression uses the Sum
function to return the sales total from Sales Amount
. The expression then uses the Format
function to return the value in a standard currency format (in US dollars). The following table shows the results returned by the Evaluate
statement:
Total Sales |
$29,358,677.22 |
If you want to return more than one column, when using the Row
function, you must include a name/expression pair for each one, separated by a comma, as shown in the following example:
1 2 3 4 5 6 7 8 9 10 |
evaluate ( row ( "Total Sales", format(sum('Internet Sales'[Sales Amount]), "Currency"), "Average Sales", format(average('Internet Sales'[Sales Amount]), "Currency"), "Highest Sale", format(max('Internet Sales'[Sales Amount]), "Currency"), "Lowest Sale", format(min('Internet Sales'[Sales Amount]), "Currency") ) ) |
Each column definition follows the same structure: column name followed by column expression. In this case, the column definitions differ only in the column name and the statistical function being applied. The Average
function returns the average sales amount, the Max
function return the highest sales amount, and the Min
function returns the lowest. As the following results show, the Evaluate
statement now returns four columns, but still one row:
Total Sales | Average Sales | Highest Sale | Lowest Sale |
$29,358,677.22 | $486.09 | $3,578.27 | $2.29 |
When using a statistical function such as Average
, it can also be useful to know how many rows are being evaluated. The following example includes both the Count
and DistinctCount
statistical functions:
1 2 3 4 5 6 7 8 9 10 |
evaluate ( row ( "Total Sales", format(sum('Internet Sales'[Sales Amount]), "Currency"), "Average Sales", format(average('Internet Sales'[Sales Amount]), "Currency"), "Count", count('Internet Sales'[Sales Amount]), "Distinct Count", distinctcount('Internet Sales'[Sales Amount]) ) ) |
The Count
function returns the total number of values in the specified column, and the DistinctCount
function returns the number of unique values in that column. The Count
function does not work for string values, only number and date columns. Because we’re using the function on the Sales Amount
column, it works fine because the column contains numerical data only. The following table shows the results now returned by the Evaluate
statement:
Total Sales | Average Sales | Count | Distinct Count |
$29,358,677.22 | $486.09 | 60398 | 42 |
When working with the Count
function, be aware that it counts only existing values. If a row is missing a value, that row is not counted. To get a count of all the rows in a table, you must use the CountRows
statistical function, as shown in the following example:
1 2 3 4 5 6 7 8 9 10 |
evaluate ( row ( "Average Cost", format(average('Product'[Standard Cost]), "Currency"), "Count", count('Product'[Standard Cost]), "Distinct Count", distinctcount('Product'[Standard Cost]), "Row Count", countrows('Product') ) ) |
In this case, the Standard
Cost
column in the Product
table contains a number of empty cells, so a count based on that column returns a number less than the number of rows in the table, but the CountRows
functions includes all rows, as the following results show:
Average Cost | Count | Distinct Count | Row Count |
$434.27 | 395 | 135 | 606 |
As you’ve seen, the Row
function can be useful whenever you need to return only a single row, which is often the case when aggregating data. However, the other statistical functions we’ve looked at are by no means limited to the Row
function. For example, you can use them with the Summarize
function when grouping data. Nevertheless, more often than not you’ll find that if you use the Row
function, you’ll be using the aggregate-type statistical functions as well.
The Topn Function
The Topn
function limits the rows returned by a query to the top number of rows specified when you call the function, similar to how a TOP
expression works in T-SQL. When you use the Topn
function, you must include three parameters. The first specifies the number of rows to return. The second is the table or table expression from which you’re extracting the top rows. And the third is an expression that specifies the order in which the rows in the table should be sorted before returning the top rows. Usually this is one or more columns in the target table.
For example, suppose you want to return the first five rows of the Sales Territory
table, with the data sorted by the SalesTerritoryKey
column. The following example shows how to call the Topn
function and pass in the parameter values:
1 2 3 4 |
evaluate ( topn(5, 'Sales Territory', [SalesTerritoryKey]) ) |
Once again, we start with an Evaluate
statement and specify the Top
n
function as the statement’s table expression. Into the function, we pass in the three required parameter values, which specify that the top five rows should be returned from the Sales Territory
table, sorted by the SalesTerritoryKey
column. The Evaluate
statement returns the following data:
SalesTerritoryKey |
Sales Territory Region |
Sales Territory Country |
Sales Territory Group |
Sales Territory Image |
7 |
France |
France |
Europe |
System.Byte[] |
9 |
Australia |
Australia |
Pacific |
System.Byte[] |
8 |
Germany |
Germany |
Europe |
System.Byte[] |
10 |
United Kingdom |
United Kingdom |
Europe |
System.Byte[] |
11 |
NA |
NA |
NA |
System.Byte[] |
One thing that might seem odd about the results is that they return rows with the SalesTerritoryKey
values of 7
through 11
. If you’re familiar with the data, you know that the first value is 1
. We’ll get into more details about this in a bit, but for now, know that the Topn
function, by default, sorts the data in descending order, unlike most sort operations you’ll come across.
Another odd aspect of these results is that they’re not ordered according to the SalesTerritoryKey
column, as you might expect. Although you have to specify the order when calling the Topn
function, that order applies only when determining which rows to return, not when ordering the result set. For that, you need to add an Order
By
clause that specifies the SalesTerritoryKey
column, as shown in the following example:
1 2 3 4 5 |
evaluate ( topn(5, 'Sales Territory', [SalesTerritoryKey]) ) order by 'Sales Territory'[SalesTerritoryKey] |
Now our statement will return the data in the specified order, as the following results show:
SalesTerritoryKey |
Sales Territory Region |
Sales Territory Country |
Sales Territory Group |
Sales Territory Image |
7 |
France |
France |
Europe |
System.Byte[] |
8 |
Germany |
Germany |
Europe |
System.Byte[] |
9 |
Australia |
Australia |
Pacific |
System.Byte[] |
10 |
United Kingdom |
United Kingdom |
Europe |
System.Byte[] |
11 |
NA |
NA |
NA |
System.Byte[] |
The Topn
function actually supports a fourth parameter, which determines whether the rows are sorted in ascending or descending order. As mentioned above, by default, they’re sorted in descending order. However, you can specify ascending or descending my adding a 0
for descending or a 1
for ascending. For example, the following Evaluate
statement returns the top five rows in ascending order:
1 2 3 4 5 |
evaluate ( topn(5, 'Sales Territory', [SalesTerritoryKey], 1) ) prder by 'Sales Territory'[SalesTerritoryKey] |
The only difference between this example and the previous one is the addition of the fourth parameter for the Topn
function. Now the statement returns the first five rows in the table, as shown in the following results:
SalesTerritoryKey |
Sales Territory Region |
Sales Territory Country |
Sales Territory Group |
Sales Territory Image |
1 |
Northwest |
United States |
North America |
System.Byte[] |
2 |
Northeast |
United States |
North America |
System.Byte[] |
3 |
Central |
United States |
North America |
System.Byte[] |
4 |
Southwest |
United States |
North America |
System.Byte[] |
5 |
Southeast |
United States |
North America |
System.Byte[] |
You can also use the Topn
function in conjunction with other functions. For example, the following Evaluate
statement returns the top five rows from the table returned by the Summarize
function:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
evaluate ( topn(5, summarize ( 'Internet Sales', 'Product'[Product Name], 'Date'[Calendar Year], "Sales Amount", format(sum('Internet Sales'[Sales Amount]), "currency") ), [Sales Amount], 0) ) order by [Sales Amount] desc |
In the previous examples, I specified the Sales
Territory
table as the second parameter of the Topn
function, but this time around I’m using the Summarize
function as the table expression. The function returns the Product Name
and Calendar Year
columns, along with a calculated column that provides a sales total for each group of data. I then use the Topn
function to return only the first five rows of the data returned by the Summarize
function. Notice that I specify the calculated column (Sales
Amount
) as the sort column, in descending order. The following table shows the results returned by the Evaluate
statement:
Product Name |
Calendar Year |
Sales Amount |
Mountain-100 Silver, 38 |
2006 |
$98,599.71 |
Mountain-100 Silver, 38 |
2005 |
$98,599.71 |
Mountain-100 Black, 44 |
2005 |
$97,874.71 |
Mountain-100 Black, 38 |
2006 |
$94,499.72 |
Mountain-100 Black, 42 |
2006 |
$91,124.73 |
As you can see, the data now includes only the three columns specified in the Summarize
function, with the data sorted in descending order, based on the Sales
Amount
column.
The Rollup Function
You’ve already gotten a taste of the Rollup
function in the second article of this series. This time around, however, we’ll go into the function a little deeper and look at a couple other functions associated with Rollup
.
The Rollup
function is used in conjunction with the Summarize
function. The Summarize
function groups data together by the columns specified when you call the function. The function then returns a table that can be used by other functions, wherever a parameter requires a table value. You can also use the function as the table expression in an Evaluate
statement.
The Summarize
function supports the use of the Rollup
function as part of the Summarize
definition. The Rollup
function adds roll-up rows to the returned table based on the columns used to group the data. The roll-up rows provide an additional layer of aggregation, above what you’ve already defined in your Summarize
function. Let’s look at example to better understand how this works. In the following Evaluate
statement, we use the Summarize
function as the statement’s table expression:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
evaluate ( summarize ( calculatetable ( 'Internet Sales', 'Product Category'[Product Category Name] = "Accessories" ), 'Product'[Product Name], rollup('Date'[Calendar Year]), "Total Sales", format(sum('Internet Sales'[Sales Amount]), "currency") ) ) order by 'Product'[Product Name], 'Date'[Calendar Year] |
The first parameter in the Summarize
function requires its own table or table expression as its value. In this case, I’ve used a CalculateTable
function to create that expression. The function returns the Internet Sales
table, but limits the rows returned to those with a Product Category Name
value equal to Accessories
.
The second parameter value in the Summarize
function specifies the Product Name
column, which means the data will be grouped based on the values in that column.
The third value specifies the Calendar Year
column. As a result, each product will be grouped by the year it was sold. However, I’ve also included the Rollup
function in this expression. That means an additional row will be returned for each product that rolls up the data into a total for all years.
The last parameter value in the Summarize
function is a calculated column named Total Sales
. The column uses the Sum
aggregate function to total the Sales Amount
values for each grouping. I’ve also used the F
ormat
function to return the data in a currency format. The following table shows the results returned by the Evaluate
statement:
Product Name |
Calendar Year |
Total Sales |
All-Purpose Bike Stand |
$39,591.00 |
|
All-Purpose Bike Stand |
2007 |
$18,921.00 |
All-Purpose Bike Stand |
2008 |
$20,670.00 |
Bike Wash – Dissolver |
$7,218.60 |
|
Bike Wash – Dissolver |
2007 |
$3,044.85 |
Bike Wash – Dissolver |
2008 |
$4,173.75 |
Fender Set – Mountain |
$46,619.58 |
|
Fender Set – Mountain |
2007 |
$19,408.34 |
Fender Set – Mountain |
2008 |
$27,211.24 |
Hitch Rack – 4-Bike |
$39,360.00 |
|
Hitch Rack – 4-Bike |
2007 |
$16,440.00 |
Hitch Rack – 4-Bike |
2008 |
$22,920.00 |
HL Mountain Tire |
$48,860.00 |
|
HL Mountain Tire |
2007 |
$20,300.00 |
HL Mountain Tire |
2008 |
$28,560.00 |
HL Road Tire |
$27,970.80 |
|
HL Road Tire |
2007 |
$12,877.00 |
HL Road Tire |
2008 |
$15,093.80 |
Hydration Pack – 70 oz. |
$40,307.67 |
|
Hydration Pack – 70 oz. |
2007 |
$16,771.95 |
Hydration Pack – 70 oz. |
2008 |
$23,535.72 |
As with the previous examples, the results shown here are only part of the returned data, but you can see that totals are returned for each product sold and year it was sold. You can also see the roll-up row for each product-the rows without a Calendar Year
value.
There might be times when you want to mark in some way when a row is a roll-up, other than just depending on blank values in the roll-up column. DAX includes the IsSubtotal
function that you can use in conjunction with the Rollup
function to create a column that indicates when a row is a roll-up. The IsSubtotal
function returns a value of True
if the row is a roll-up. Otherwise, the function returns False
. The following example demonstrates how to use the IsSubtotal
function by creating an additional column named Is
Rollup
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
evaluate ( summarize ( calculatetable ( 'Internet Sales', 'Product Category'[Product Category Name] = "Accessories" ), 'Product'[Product Name], rollup('Date'[Calendar Year]), "Total Sales", format(sum('Internet Sales'[Sales Amount]), "currency"), "Is Rollup", IsSubtotal('Date'[Calendar Year]) ) ) order by 'Product'[Product Name], 'Date'[Calendar Year] |
The Evaluate
statement is the same as the one shown in the preceding example, except for the addition of the Is Rollup
column. Notice that the expression that defines the column includes the IsSubtotal
function. When calling the function, you need to include the name of the column specified in the Rollup
function, in this case the Calendar Year
column. The Evaluate
statement returns the following results:
Product Name |
Calendar Year |
Total Sales |
Is Rollup |
All-Purpose Bike Stand |
$39,591.00 |
True |
|
All-Purpose Bike Stand |
2007 |
$18,921.00 |
False |
All-Purpose Bike Stand |
2008 |
$20,670.00 |
False |
Bike Wash – Dissolver |
$7,218.60 |
True |
|
Bike Wash – Dissolver |
2007 |
$3,044.85 |
False |
Bike Wash – Dissolver |
2008 |
$4,173.75 |
False |
Fender Set – Mountain |
$46,619.58 |
True |
|
Fender Set – Mountain |
2007 |
$19,408.34 |
False |
Fender Set – Mountain |
2008 |
$27,211.24 |
False |
Hitch Rack – 4-Bike |
$39,360.00 |
True |
|
Hitch Rack – 4-Bike |
2007 |
$16,440.00 |
False |
Hitch Rack – 4-Bike |
2008 |
$22,920.00 |
False |
HL Mountain Tire |
$48,860.00 |
True |
|
HL Mountain Tire |
2007 |
$20,300.00 |
False |
HL Mountain Tire |
2008 |
$28,560.00 |
False |
HL Road Tire |
$27,970.80 |
True |
|
HL Road Tire |
2007 |
$12,877.00 |
False |
HL Road Tire |
2008 |
$15,093.80 |
False |
Hydration Pack – 70 oz. |
$40,307.67 |
True |
|
Hydration Pack – 70 oz. |
2007 |
$16,771.95 |
False |
Hydration Pack – 70 oz. |
2008 |
$23,535.72 |
False |
The result set now includes the Is
Rollup
column and is populated with True
and False
values, with each roll-up row receiving a value of True
.
Now suppose that instead of grouping our data by product and then year sold, we group it by product category, then product subcategory, and finally the year sold, as shown in the following example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
evaluate ( summarize ( 'Internet Sales', 'Product Category'[Product Category Name], 'Product Subcategory'[Product Subcategory Name], rollup('Date'[Calendar Year]), "Total Sales", format(sum('Internet Sales'[Sales Amount]), "currency"), "Is Rollup", IsSubtotal('Date'[Calendar Year]) ) ) order by 'Product Category'[Product Category Name], 'Product Subcategory'[Product Subcategory Name], 'Date'[Calendar Year] |
As in the previous examples, we’re rolling up our data based on the Calendar
Year
column. Consequently, the statement will provide a roll-up row for each category/subcategory grouping, as shown in the following results:
Product Category Name |
Product Subcategory Name |
Calendar Year |
Total Sales |
Is Rollup |
Accessories |
Bike Racks |
$39,360.00 |
True |
|
Accessories |
Bike Racks |
2007 |
$16,440.00 |
False |
Accessories |
Bike Racks |
2008 |
$22,920.00 |
False |
Accessories |
Bike Stands |
$39,591.00 |
True |
|
Accessories |
Bike Stands |
2007 |
$18,921.00 |
False |
Accessories |
Bike Stands |
2008 |
$20,670.00 |
False |
Accessories |
Bottles and Cages |
$56,798.19 |
True |
|
Accessories |
Bottles and Cages |
2007 |
$23,280.27 |
False |
Accessories |
Bottles and Cages |
2008 |
$33,517.92 |
False |
Accessories |
Cleaners |
$7,218.60 |
True |
|
Accessories |
Cleaners |
2007 |
$3,044.85 |
False |
Accessories |
Cleaners |
2008 |
$4,173.75 |
False |
Accessories |
Fenders |
$46,619.58 |
True |
|
Accessories |
Fenders |
2007 |
$19,408.34 |
False |
Accessories |
Fenders |
2008 |
$27,211.24 |
False |
Accessories |
Helmets |
$225,335.60 |
True |
|
Accessories |
Helmets |
2007 |
$92,583.54 |
False |
Accessories |
Helmets |
2008 |
$132,752.06 |
False |
Accessories |
Hydration Packs |
$40,307.67 |
True |
|
Accessories |
Hydration Packs |
2007 |
$16,771.95 |
False |
Accessories |
Hydration Packs |
2008 |
$23,535.72 |
False |
Accessories |
Tires and Tubes |
$245,529.32 |
True |
|
Accessories |
Tires and Tubes |
2007 |
$103,259.76 |
False |
Accessories |
Tires and Tubes |
2008 |
$142,269.56 |
False |
Bikes |
Mountain Bikes |
$9,952,759.56 |
True |
|
Bikes |
Mountain Bikes |
2005 |
$585,973.27 |
False |
Bikes |
Mountain Bikes |
2006 |
$1,562,456.76 |
False |
Bikes |
Mountain Bikes |
2007 |
$3,989,638.48 |
False |
Bikes |
Mountain Bikes |
2008 |
$3,814,691.06 |
False |
This level of roll-up is probably fine for some of your needs, but at times, you might find it handy to return a roll-up row for each category, as well as each category/subcategory grouping. The easiest way to do that is to add the Product
Subcategory
Name
column to the Rollup
function, as shown in the following example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
evaluate ( summarize ( 'Internet Sales', 'Product Category'[Product Category Name], rollup('Product Subcategory'[Product Subcategory Name], 'Date'[Calendar Year]), "Total Sales", format(sum('Internet Sales'[Sales Amount]), "currency"), "Is Rollup", IsSubtotal('Date'[Calendar Year]) ) ) order by 'Product Category'[Product Category Name], 'Product Subcategory'[Product Subcategory Name], 'Date'[Calendar Year] |
When adding an additional column to the Rollup
function, you must separate the column names with a comma, but that’s all you’ll need to do. As the following table shows, the results now include a roll-up row for each product category:
Product Category Name |
Product Subcategory Name |
Calendar Year |
Total Sales |
Is Rollup |
Accessories |
$700,759.96 |
True |
||
Accessories |
Bike Racks |
$39,360.00 |
True |
|
Accessories |
Bike Racks |
2007 |
$16,440.00 |
False |
Accessories |
Bike Racks |
2008 |
$22,920.00 |
False |
Accessories |
Bike Stands |
$39,591.00 |
True |
|
Accessories |
Bike Stands |
2007 |
$18,921.00 |
False |
Accessories |
Bike Stands |
2008 |
$20,670.00 |
False |
Accessories |
Bottles and Cages |
$56,798.19 |
True |
|
Accessories |
Bottles and Cages |
2007 |
$23,280.27 |
False |
Accessories |
Bottles and Cages |
2008 |
$33,517.92 |
False |
Accessories |
Cleaners |
$7,218.60 |
True |
|
Accessories |
Cleaners |
2007 |
$3,044.85 |
False |
Accessories |
Cleaners |
2008 |
$4,173.75 |
False |
Accessories |
Fenders |
$46,619.58 |
True |
|
Accessories |
Fenders |
2007 |
$19,408.34 |
False |
Accessories |
Fenders |
2008 |
$27,211.24 |
False |
Accessories |
Helmets |
$225,335.60 |
True |
|
Accessories |
Helmets |
2007 |
$92,583.54 |
False |
Accessories |
Helmets |
2008 |
$132,752.06 |
False |
Accessories |
Hydration Packs |
$40,307.67 |
True |
|
Accessories |
Hydration Packs |
2007 |
$16,771.95 |
False |
Accessories |
Hydration Packs |
2008 |
$23,535.72 |
False |
Accessories |
Tires and Tubes |
$245,529.32 |
True |
|
Accessories |
Tires and Tubes |
2007 |
$103,259.76 |
False |
Accessories |
Tires and Tubes |
2008 |
$142,269.56 |
False |
Bikes |
$28,318,144.65 |
True |
||
Bikes |
Mountain Bikes |
$9,952,759.56 |
True |
|
Bikes |
Mountain Bikes |
2005 |
$585,973.27 |
False |
Bikes |
Mountain Bikes |
2006 |
$1,562,456.76 |
False |
Bikes |
Mountain Bikes |
2007 |
$3,989,638.48 |
False |
Bikes |
Mountain Bikes |
2008 |
$3,814,691.06 |
False |
The Summarize
function supports one other related function worth mentioning: RollupGroup
. When used in conjunction with the Rollup
function, the RollupGroup
function prevents partial subtotals from being included in the result set. Let’s look at another example to demonstrate how this works. In the following Evaluate
statement, I’ve added the RollupGroup
function as part of the Rollup
function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
evaluate ( summarize ( 'Internet Sales', 'Product Category'[Product Category Name], rollup(rollupgroup('Product Subcategory'[Product Subcategory Name], 'Date'[Calendar Year])), "Total Sales", format(sum('Internet Sales'[Sales Amount]), "currency"), "Is Rollup", IsSubtotal('Date'[Calendar Year]) ) ) order by 'Product Category'[Product Category Name], 'Product Subcategory'[Product Subcategory Name], 'Date'[Calendar Year] |
The column names are passed into the RollupGroup
function, and that function is passed to the Rollup
function. As to be expected, the statement now returns slightly different results:
Product Category Name |
Product Subcategory Name |
Calendar Year |
Total Sales |
Is Rollup |
Accessories |
$700,759.96 |
True |
||
Accessories |
Bike Racks |
2007 |
$16,440.00 |
False |
Accessories |
Bike Racks |
2008 |
$22,920.00 |
False |
Accessories |
Bike Stands |
2007 |
$18,921.00 |
False |
Accessories |
Bike Stands |
2008 |
$20,670.00 |
False |
Accessories |
Bottles and Cages |
2007 |
$23,280.27 |
False |
Accessories |
Bottles and Cages |
2008 |
$33,517.92 |
False |
Accessories |
Cleaners |
2007 |
$3,044.85 |
False |
Accessories |
Cleaners |
2008 |
$4,173.75 |
False |
Accessories |
Fenders |
2007 |
$19,408.34 |
False |
Accessories |
Fenders |
2008 |
$27,211.24 |
False |
Accessories |
Helmets |
2007 |
$92,583.54 |
False |
Accessories |
Helmets |
2008 |
$132,752.06 |
False |
Accessories |
Hydration Packs |
2007 |
$16,771.95 |
False |
Accessories |
Hydration Packs |
2008 |
$23,535.72 |
False |
Accessories |
Tires and Tubes |
2007 |
$103,259.76 |
False |
Accessories |
Tires and Tubes |
2008 |
$142,269.56 |
False |
Bikes |
$28,318,144.65 |
True |
||
Bikes |
Mountain Bikes |
2005 |
$585,973.27 |
False |
Bikes |
Mountain Bikes |
2006 |
$1,562,456.76 |
False |
Bikes |
Mountain Bikes |
2007 |
$3,989,638.48 |
False |
Bikes |
Mountain Bikes |
2008 |
$3,814,691.06 |
False |
A roll-up row is included for each category, but not for the subcategories. As you can see, DAX provides a number of ways to roll up your data so your results include exactly the information you need.
The Crossjoin Function
The last function that we’ll look at is Crossjoin
. You use a Crossjoin
function to join two or more tables in order to return a Cartesian product, a result set in which each row in each table is joined with each row in each other table. The result set is the product of the number of rows from each table. For example, if one table contains five rows and a second table contains seven rows, the Cartesian product for those two tables will contain 35 rows.
This will all become clearer as we work through our examples. But first, take a look at the following Evaluate
statement, in which we retrieve data from the Internet Sales
table:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
evaluate ( summarize ( 'Internet Sales', 'Product Category'[Product Category Name], 'Date'[Calendar Year], "Total Sales", format(sum('Internet Sales'[Sales Amount]), "currency") ) ) order by 'Product Category'[Product Category Name], 'Date'[Calendar Year] |
The statement contains nothing that you haven’t seen before. It groups data based on the Product
Category
and Calendar
Year
columns and provides the sales totals for each group, as shown in the following results:
Product Category Name |
Calendar Year |
Total Sales |
Accessories |
2007 |
$293,709.71 |
Accessories |
2008 |
$407,050.25 |
Bikes |
2005 |
$3,266,373.66 |
Bikes |
2006 |
$6,530,343.53 |
Bikes |
2007 |
$9,359,102.62 |
Bikes |
2008 |
$9,162,324.85 |
Clothing |
2007 |
$138,247.97 |
Clothing |
2008 |
$201,524.64 |
The Evaluate
statement returns just what we’d expect. However, what’s important about these results are not what they include, but what is missing. For example, the Product Category
column in the source table includes the value Components
, but that value isn’t included in the results because no sales are associated with that category. In addition, years are displayed only if they have sales associated with them. For instance, the results include information about the Accessories category for 2007 and 2008, but no other years are listed.
Often, we’ll want our results to be more complete than what we’re able to generate with a simple Summarize
function and for that we need the Crossjoin
function. It returns a Cartesian product, and we can use that data to generate more complete results.
Let’s look at an example to give you an idea of how the Crossjoin
function works. In the following Evaluate
statement, I use the Crossjoin
function as the statement’s table expression:
1 2 3 4 5 6 7 8 9 10 11 |
evaluate ( crossjoin ( values('Product Category'[Product Category Name]), values('Date'[Calendar Year]) ) ) order by 'Product Category'[Product Category Name], 'Date'[Calendar Year] |
I pass two arguments into the Crossjoin
function. Each argument must return a table. In this case, I use the Values
function for each argument in order to return a one-column table of distinct values. The first Values
function returns a list of values from the Product Category Name
column, and the second Values
function returns a list of values from the Calendar Year
column. (These are the same two columns used to group data in the previous example.) The Evaluate
statement returns the results shown in the following table:
Product Category Name |
Calendar Year |
2005 |
|
2006 |
|
2007 |
|
2008 |
|
2009 |
|
2010 |
|
Accessories |
2005 |
Accessories |
2006 |
Accessories |
2007 |
Accessories |
2008 |
Accessories |
2009 |
Accessories |
2010 |
Bikes |
2005 |
Bikes |
2006 |
Bikes |
2007 |
Bikes |
2008 |
Bikes |
2009 |
Bikes |
2010 |
Clothing |
2005 |
Clothing |
2006 |
Clothing |
2007 |
Clothing |
2008 |
Clothing |
2009 |
Clothing |
2010 |
Components |
2005 |
Components |
2006 |
Components |
2007 |
Components |
2008 |
Components |
2009 |
Components |
2010 |
Each row in each table has been joined with each row in the other table. For example, the Accessories
value in the Product Category Name
column is matched with each year in the Calendar Year
column. The reason that there are blank values in the Product Category Name
column is because the source data-the Product Category
table in the tabular database-includes a blank member, and that blank member is matched with each year, just like the other Product Category Name
values. Because the Product Category
column contains five distinct values, and the Calendar Year
column contains six distinct values, our query returns 30 rows.
Now that we know how to use the Crossjoin
function to produce the Cartesian product, we can include it in our Evaluate
statement to return more meaningful data. The following example uses the Crossjoin
function as the table expression within the Summarize
function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
evaluate ( summarize ( crossjoin ( values('Product Category'[Product Category Name]), values('Date'[Calendar Year]) ), 'Product Category'[Product Category Name], 'Date'[Calendar Year], "Total Sales", format(sum('Internet Sales'[Sales Amount]), "currency") ) ) order by 'Product Category'[Product Category Name], 'Date'[Calendar Year] |
The Summarize
function is now working with a Cartesian
product as its table. We can then group the data based on the Product
Category
Name
column and the Calendar
Year
column, both of which are including in the Cartesian product. Now the Evaluate
statement returns the following results:
Product Category Name |
Calendar Year |
Total Sales |
2005 |
||
2006 |
||
2007 |
||
2008 |
||
2009 |
||
2010 |
||
Accessories |
2005 |
|
Accessories |
2006 |
|
Accessories |
2007 |
$293,709.71 |
Accessories |
2008 |
$407,050.25 |
Accessories |
2009 |
|
Accessories |
2010 |
|
Bikes |
2005 |
$3,266,373.66 |
Bikes |
2006 |
$6,530,343.53 |
Bikes |
2007 |
$9,359,102.62 |
Bikes |
2008 |
$9,162,324.85 |
Bikes |
2009 |
|
Bikes |
2010 |
|
Clothing |
2005 |
|
Clothing |
2006 |
|
Clothing |
2007 |
$138,247.97 |
Clothing |
2008 |
$201,524.64 |
Clothing |
2009 |
|
Clothing |
2010 |
|
Components |
2005 |
|
Components |
2006 |
|
Components |
2007 |
|
Components |
2008 |
|
Components |
2009 |
|
Components |
2010 |
As you can see, all the data is included in the results, whether or not any sales were made for a particular category in a particular year. This way, you know you’re returning the complete results, which can often give you a better picture of the data-and point to where potential problems might exist.
DAX Statistical Functions
You should now have a good sense of how to use the Row
, Topn
, Rollup
, and Crossjoin
functions, along with several of the aggregate statistical functions. Once you know the basics of how to use them, you should be able to apply them to other situations. DAX also supports a number of other statistical functions, as well as additional functions types, such as date and time functions, filter functions, and logical functions. Be sure to check out the TechNet article “DAX Function Reference” for information about those other functions as well as additional details about the functions we’ve covered.
Load comments