{"id":2165,"date":"2016-02-08T00:00:00","date_gmt":"2016-02-08T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/sql-group-by-basics\/"},"modified":"2021-09-29T16:21:22","modified_gmt":"2021-09-29T16:21:22","slug":"sql-group-by-basics","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/sql-group-by-basics\/","title":{"rendered":"SQL GROUP BY Basics"},"content":{"rendered":"<div id=\"pretty\">\n<p class=\"start\">The <code>GROUP BY<\/code> and <code>HAVING<\/code> clauses in SQL goes all the way back to the start of the language, in its simplest form. The original idea was that you would build a working table using the <code>SELECT ... FROM ... WHERE<\/code> clauses, and then partition that result set into groups. A group is defined as a set of rows with the same values for the grouping columns. The non-grouping columns had to be group properties (usually aggregate functions), formulas or computations done on these or constants (which is technically a group property).<\/p>\n<p>Each group is then reduced down to a single row. The important concept here is the original table no longer exists. However, if you are old enough to have to work with the very early versions of SQL Server, the Sybase syntax did not get rid of the original table. This led to some problems. It exposed the original row data with the aggregated data.<\/p>\n<p>The original <code>HAVING<\/code> clause was then applied to each of the rows in the new grouped table. This meant that it was done after the <code>WHERE<\/code> clause that built the original table.<\/p>\n<p>Later we added Windows clauses, extensions to the <code>GROUP BY<\/code> in the form of <code>ROLLUP<\/code>, <code>CUBE<\/code> and <code>GROUPING SETS<\/code>. But there is still quite a bit you can do with the original simple <code>GROUP BY<\/code> &#8211; <code>HAVING<\/code> constructs. They have the advantage of completing their work in a single table scan, while the newer constructs may need temporary storage or multiple passes through the data. The originals are portable and predictable. They are now well-optimized after all these decades.<\/p>\n<h2>\u00a0<b>Basic Aggregate Functions<\/b><\/h2>\n<p>In the original SQL, we had a limited set of simple aggregate functions. Their general format was &#8216;<code>&lt;function name&gt; ([DISTINCT| ALL] &lt;expression&gt;)<\/code>&#8220;. The first thing these functions do is to remove all of the <code>NULL<\/code>s from the data. The function could be applied to an empty set, if only <code>NULL<\/code>s were returned. What is the sum or average of an empty set? Following the usual conventions of SQL, empty sets return <code>NULL<\/code>, <i>not zero.<\/i> Zero is an actual value; that was a very big debate in the Middle Ages, when Europeans started seeing zero showing up in those newfangled Hindu Arabic numerals<\/p>\n<p>SQL has a convention that we can put keywords inside the parameters \u00a0in some of the intrinsic system functions. The aggregate functions have the option of <code>DISTINCT <\/code>or <code>ALL<\/code>; if nothing is given, then <code>ALL<\/code> is assumed.<\/p>\n<div class=\"pullout\">\n<p><code>SUM ([DISTINCT| ALL] &lt;numeric expression&gt;) <br \/> <\/code> = arithmetic total of a numeric expression.<\/p>\n<p><code>AVG ([DISTINCT| ALL] &lt;numeric expression&gt;) <br \/> <\/code>= arithmetic average or mean of a numeric expression.<\/p>\n<p><code>\u00a0MIN ([DISTINCT| ALL] &lt;expression&gt;)<\/code> <br \/> = depending on the expression, the smallest numeric value, the least recent temporal value, or the lowest collation value in a string.<\/p>\n<p><code>\u00a0MAX ([DISTINCT| ALL] &lt;expression&gt;)<\/code> <br \/> = depending on the expression, the greatest numeric value, the temporal value for the wrist in the future, or the highest collation value in a string.<\/p>\n<p><code>\u00a0COUNT ([DISTINCT| ALL] &lt;expression&gt;) <br \/> <\/code> = the cardinality of the set of expression values, regardless of type.<\/p>\n<p><code>\u00a0COUNT ([DISTINCT| ALL] *)<\/code> <br \/> = the cardinality of the table, taken as a whole set. <b> <\/b><\/p>\n<\/div>\n<p>The <code>DISTINCT<\/code> keyword says remove all redundant duplicates (i.e. keep one copy) from the set of expressions being aggregated. The <code>ALL<\/code> keyword says to retain all the rows (this is like<code> UNION<\/code> versus <code>UNION ALL<\/code> and so forth in other parts of the language). In practice, nobody actually uses the <code>ALL<\/code> option; in fact, a lot of the newer SQL programmers do not even know it exists.<\/p>\n<p>It may come as a surprise that &lt;expression&gt; as a parameter to an aggregate function does not have to be just a simple column name. However, it does have some limitations, based on scoping. This is why you cannot nest simple aggregate functions inside each other. Think about<code> SUM (AVG (x))<\/code> and what it means. First, we would compute the average. It would be a single value (or a <code>NULL<\/code>). The sum of a single value is just that single value (or a <code>NULL<\/code>).<\/p>\n<p>Also, the expression has to be of an appropriate data type. The <code>SUM ()<\/code> and <code>AVG ()<\/code> are clearly numeric. It would not make much sense to take an average of a result set of dates or strings, would it? However, the scale and precision of the results are implementation defined in ANSI\/ISO standards. You need to know your product; Oracle, DB2, and SQL Server are not the same in the way they handle this.<\/p>\n<p>The <code>MIN ()<\/code> and <code>MAX ()<\/code> are called the &#8216;extrema&#8217; functions. They work for all data types in SQL but with some caveats. Technically, the <code>DISTINCT<\/code> and <code>ALL<\/code> options exist, but <code>DISTINCT <\/code>is insanely redundant. If you use it, I am pretty sure any optimizer will catch it. As an aside, when the <code>MIN ()<\/code> or <code>MAX ()<\/code> is computed, the optimizer will take the first value as the max (or min) and compare to the following values; if it finds a greater (or lesser) value, that becomes the new max (or min). This is fast and simple scan. But you will see SQL written with <code>ROW_NUMBER()<\/code> <code>OVER(..)<\/code> followed by filters for a row number equal to one to do this same task! All that does is force a useless sort on the data,<\/p>\n<p>Numeric values have a well-defined order, temporal data has a well-defined order, but strings depend on their collation to determine their order. Even the same language can have several collations; German is one example. Even worse, in SQL varying length strings have to be padded out with trailing blanks before they are compared. Thus, &#8216;Joe&#8217;, &#8216;Joe &#8216; and &#8216;Joe &#8216; are all equal. But which one represents that group?<\/p>\n<p>This is why we like to design encodings that use only the basic ASCII subset that ISO requires for all Unicode languages. But that is another topic.<\/p>\n<p>As usual, <code>NULL<\/code>s pose their own problem. Technically a <code>NULL<\/code> is not equal anything, not even another <code>NULL<\/code>. Following that model, each <code>NULL<\/code> in the result set should form its own group. We actually debated that in ANSI X3H2 in the early days.<\/p>\n<p>The decision was to have both equality (=) and grouping. When grouping is used, all <code>NULL<\/code>s are treated as a single group. Well, it is technically called an equivalence class for anybody that had a course in set theory. This was also one reason that we dropped <code>NULL<\/code>s from the result set. This convention is actually pretty handy in practice. <b> \u00a0<\/b><\/p>\n<h2><b> Group Characteristics<\/b><\/h2>\n<p>It is often useful to look at groupings that have some characteristic in common <i>without actually looking at the particulars. <\/i> This is done in the <code>HAVING<\/code> clause, which deals with the group characteristics. Doing the work in the <code>HAVING<\/code> clause can save complicated joins, <strong>CASE<\/strong> expressions and subqueries.<\/p>\n<p>The most useful tools are usually the simplest. The SIGN (), properly called the &#8216;signum&#8217;, the absolute value, <code>ABS ()<\/code>, and simple math are probably the most useful. Here are some examples: <b><\/b><\/p>\n<pre lang:tsql=\"\" theme:none=\"\">HAVING COUNT (DISTINCT &lt;expression&gt;) = COUNT (&lt;expression&gt;)<\/pre>\n<p>We know &lt;expression&gt; has all distinct values, otherwise the left hand side would be smaller than the right hand side count.<\/p>\n<pre lang:tsql=\"\" theme:none=\"\">\u00a0HAVING COUNT (*) = COUNT (&lt;expression&gt;); <\/pre>\n<p>There are no <code>NULL<\/code>s in the results. The cardinality on the left side would count them, but the right hand side would not.<\/p>\n<pre lang:tsql=\"\" theme:none=\"\">\u00a0HAVING MIN (&lt;expression&gt;) = MAX (&lt;expression&gt;) <\/pre>\n<p>This tell us that the group has only one value. This is actually quite handy. Variations on this idea are:<\/p>\n<pre lang:tsql=\"\" theme:none=\"\">\u00a0HAVING MIN (SIGN (&lt;expression&gt;)) = MAX (SIGN (&lt;expression&gt;)) <\/pre>\n<p>We now know the &lt;expression&gt; is all positive, all negative or all zero values.<\/p>\n<pre lang:tsql=\"\" theme:none=\"\">\u00a0HAVING MIN (ABS (&lt;expression&gt;)) = 0;   \t<\/pre>\n<p>Tells us we have at least one zero.<\/p>\n<h2>\u00a0<b>Programming Technique For Grouping<\/b><\/h2>\n<p><b> \u00a0<\/b>The basic programming technique for grouping is fairly simple and a recipe to follow. First, write CTE or select statement that gets the groups. That sounds kind of obvious, but it is a little more complicated. Most of the time, the grouping attributes are in the original set.<\/p>\n<p>The common example would be geographic codes, such as state codes. These are nice discrete values that make obvious groups. They are probably well understood by the people using the data.<\/p>\n<p>\u00a0Sometimes the groups may have to be computed. For example, if you have information about a birthdate, you might want to partition them into age groups (0 to 10, 11 to 14, 15 to 18, 18 to 25 and finally the horrible 65+ years of age). This still gives you nice sharp categories. However, you now have to go to a join on the table with the range of each group to get your working table<\/p>\n<p>However, if we use approximate numeric values for the groups got all kinds of worries about rounding errors and things on the boundaries. This is the worst possible situation.<\/p>\n<p>Once you decide what your groups are, we go to step number two. This is simply do the group by. Technically we do not actually need a <code>GROUP BY<\/code> clause. If it is not given, then the entire result set is treated as if it is a single group. Originally the group by was a simple list of columns from the query. The columns in this list did not have to appear in the <code>SELECT<\/code> statement (this was a myth from the very early days of SQL).<\/p>\n<p>The result of the<code> SELECT.. FROM.. WHERE.<\/code>. clauses is partitioned into disjoint subsets, which all have the same value in each column or are all <code>NULL<\/code>s. This is what set theory calls an equivalence class and there is lots of them. The<code> mod (n)<\/code> function is another example, which gives us groups defined by 0, 1, 2, .., (n -1) values.<\/p>\n<p>At one point in the history of the ANSI\/ISO standards, we got interested in OLAP processing and added <a href=\"https:\/\/www.simple-talk.com\/sql\/t-sql-programming\/questions-about-cube,-rollup-and-grouping-sets-that-you-were-too-shy-to-ask\/\"> cubes, rollups, grouping sets<\/a> and a grand total. All of these can be defined using the <code>GROUPING SETS<\/code> construct, but it actually gets pretty awkward to do it that way.<b><\/b><\/p>\n<h3><b> \u00a0HAVING Clause<\/b><\/h3>\n<p>Finally, we worry about using the having clause which actually can in some ways be the simplest of all of this. It is really just a <code>WHERE<\/code> clause that applies to the grouped table. We are back to simple search conditions again!<\/p>\n<p>We do not actually need the <code>HAVING<\/code> clause; if it is not there. We simply stop the point we have done the group by. But this is where we get some real power to pick out grouped character grouping characteristics.<\/p>\n<h2>\u00a0<b>Example Queries<\/b><\/h2>\n<p>A simple example of this technique will give you an idea how it works. Here a skeleton table for the example.<\/p>\n<pre lang:tsql=\"\" theme:none=\"\">CREATE TABLE Sales\r\n(salesman_name VARCHAR (10) NOT NULL, \r\n\u00a0product_name VARCHAR (10) NOT NULL\r\n\u00a0\u00a0CHECK (product_name IN ('scissors', 'paper', 'stone')), \r\n\u00a0sale_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL,\r\n\u00a0sale_amt DECIMAL(6,2) NOT NULL\r\n\u00a0\u00a0CHECK (sale_amt &gt;= 0.00)\r\nPRIMARY KEY (salesman_name, product_name, sale_date));\r\n\r\nINSERT INTO Sales\r\nVALUES\r\n('Fred', 'scissors', '2016-01-01', 100.00), \r\n('Fred', 'stone', '2016-01-01', 75.98), \r\n('Fred', 'paper', '2016-01-01', 40.00), \r\n('John', 'paper', '2016-01-02', 35.85), \r\n('John', 'paper', '2016-01-03', 0.00), \r\n('John', 'stone', '2016-01-01', 100.00),\r\n('John', 'stone', '2016-01-02', 5.00),\r\n('John', 'stone', '2016-01-04', 4.98), \r\n('Mary', 'paper', '2016-01-11', 45.95), \r\n('Mary', 'scissors', '2016-01-10', 0.00), \r\n('Mary', 'stone', '2016-01-12',20.25), \r\n('Vlad', 'stone', '2016-01-11', 23.95),\r\n('Vlad', 'stone', '2016-01-12', 100.00),\r\n('Vlad', 'stone', '2016-01-13', 75.98),\r\n('Vlad', 'stone', '2016-01-14', 16.85);\r\n\r\n<\/pre>\n<p>Our first problem is to find the salesman who sold all three products<\/p>\n<pre lang:tsql=\"\" theme:none=\"\">SELECT salesman_name\r\n\u00a0FROM Sales \r\nGROUP BY salesman_name\r\nHAVING COUNT (DISTINCT product_name) = 3;\r\n<\/pre>\n<p>This will give us &#8216;Mary&#8217; and &#8220;Fred&#8217; as the salesman who sold the whole catalog. Now, tell me if we have a salesman who has sold only one product. We do not care which product, we are looking for a specialized guy.<\/p>\n<pre lang:tsql=\"\" theme:none=\"\">SELECT salesman_name, MAX(product_name) AS single_product_name\r\n\u00a0FROM Sales \r\nGROUP BY salesman_name\r\nHAVING MIN(product_name) = MAX(product_name); \r\n<\/pre>\n<p>But why do it this way? Why not use &#8220;<code>HAVING COUNT (DISTINCT product_name) = 1<\/code>&#8221; instead?\u00a0 This would look like the first query. The optimizer has to do a scan to count the product names, but the table statistics in every SQL compiler I know about keeps at least the min, max, and row count for computing execution plans. Once I see the min (or max) value for that <i>column,<\/i> I know that I have found the min (or max) value for each <i>salesman<\/i> and can ignore the rest of his values.<\/p>\n<p>\u00a0Try another one: Do we have salesmen who sold over $100.00 or products? The obvious way is:<\/p>\n<pre lang:tsql=\"\" theme:none=\"\">SELECT salesman_name \r\n\u00a0 FROM Sales \r\n\u00a0GROUP BY salesman_name\r\nHAVING SUM(sale_amt) &gt;= 100.00;\r\n\r\n<\/pre>\n<p>We get &#8216;Fred&#8217;, &#8216;Vlad&#8217; and\u00a0 &#8216;John&#8217; as the result set. But there is an alternative.<\/p>\n<pre lang:tsql=\"\" theme:none=\"\">SELECT salesman_name \r\n\u00a0 FROM Sales \r\nGROUP BY salesman_name\r\nHAVING SIGN(SUM(sale_amt) - 100.00) &gt;= 0;\r\n\u00a0\r\n<\/pre>\n<p>In this case, this trick is probably not worth it. The next example applies the techniques in a more complex situation. This example was actually taken from a posting on a SQL forum. Yes, it is a skeleton, so be kind. However, it can be used to demonstrate how to get things into a form where we can use some of the tricks with the <code>GROUP BY<\/code>.<\/p>\n<pre lang:tsql=\"\" theme:none=\"\">CREATE TABLE Foo \r\n(generic_id CHAR (2) NOT NULL PRIMARY KEY, \r\n\u00a0vague_name VARCHAR (20) NOT NULL);\r\n\u00a0\r\nINSERT INTO Foo\r\n\u00a0VALUES ('10', 'Albert'), ('11', 'Bob'), ('12', 'Chuck');\r\n\u00a0\r\nCREATE TABLE Bar\r\n(generic_id CHAR (2) NOT NULL \r\n\u00a0REFERENCES Foo (generic_id), \r\n\u00a0creation_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL, \r\n\u00a0PRIMARY KEY (generic_id, creation_date), \r\n\u00a0assembler_flag BIT);\r\n\u00a0\r\nINSERT INTO Bar \r\nVALUES\r\n\u00a0('10', '2015-01-01', 1),\r\n\u00a0('10', '2015-01-02', 1), \r\n\u00a0('11', '2015-01-01', 0), \r\n\u00a0('11', '2015-01-12', 1), \r\n\u00a0('12', '2015-01-01', 0), \r\n\u00a0('12', '2015-02-02', 0);\r\n\u00a0\r\n<\/pre>\n<p>What I want is the latest <b> creation_date<\/b> for a given name where the <b>assembler_flag<\/b> is zero or if there is no zero <b>assembler_flag<\/b> for the name then the latest <b>creation_date<\/b> for <b>assembler_flag = 1<\/b>.<\/p>\n<pre lang:tsql=\"\" theme:none=\"\">SELECT T.vague_name, T.creation_date, T.assembler_flag \r\n\u00a0 FROM (SELECT Foo.vague_name, Bar.creation_date, Bar.assembler_flag,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0ROW_NUMBER()\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0OVER(PARTITION BY Foo.generic_id \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0ORDER BY Bar.assembler_flag, Bar.creation_date DESC)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AS rn\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0FROM Bar, Foo\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0WHERE Foo.generic_id = Bar.generic_id) AS T\r\nWHERE T.rn = 1;\r\n<\/pre>\n<p>\u00a0The results are summarized here.<\/p>\n<table class=\"MsoNormalTable\">\n<tbody>\n<tr>\n<td valign=\"top\">\n<p>Albert<\/p>\n<\/td>\n<td valign=\"top\">\n<p>&#8216;2015-01-02&#8217;<\/p>\n<\/td>\n<td valign=\"top\">\n<p>all assembler_flags = 1<\/p>\n<\/td>\n<td valign=\"top\">\n<p>\u00a01<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Bob<\/p>\n<\/td>\n<td valign=\"top\">\n<p>&#8216;2015-01-01&#8217;<\/p>\n<\/td>\n<td valign=\"top\">\n<p>all assembler_flags = 0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>\u00a00<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Chuck<\/p>\n<\/td>\n<td valign=\"top\">\n<p>&#8216;2015-02-02&#8217;<\/p>\n<\/td>\n<td valign=\"top\">\n<p>mixed assembler_flags<\/p>\n<\/td>\n<td valign=\"top\">\n<p>\u00a00<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\u00a0I named the column &#8220;<b>assembler_flag<\/b>&#8221; to stress that we do not use BIT flags in SQL; that was assembler language and this is a predicate based language. But did you notice that Bob and Chuck are treated the same way? We lost information about Chuck having mixed flags. We cannot handle <code>NULL<\/code> flags in this query, either. We also spent time partitioning, sorting and materializing a row number.<\/p>\n<p>\u00a0The usual trick for finding if a grouping has only one value is to use &#8220;<code>HAVING (MIN (assembler_flag) = MAX (assembler_flg))<\/code>&#8220;, but SQL Server gives an <code>error (<\/code><code>Msg 8117, Level 16, State 1, Line 6; Operand data type bit is invalid for min operator<\/code>)!\u00a0 This is yet another reason not to write 1960&#8217;s assembly flag code in SQL. The<code> BIT <\/code>data type in SQL Server was originally a true BIT (0, 1), then it became an exact numeric data type (0, 1, <code>NULL<\/code>) in later releases. But it does not do aggregate functions the way you might expect.<\/p>\n<p>\u00a0We still have to handle <code>NULL<\/code>s without business rules from the poster. Let&#8217;s assume that a <code>NULL<\/code> becomes zero. This gives us the first hack at an answer, with a CTE to handle the <code>NULL<\/code> problem.<\/p>\n<pre lang:tsql=\"\" theme:none=\"\">WITH Build_Groups\r\nAS\r\n\u00a0(SELECT Foo.vague_name, Bar.creation_date, \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0CAST (COALESCE(Bar.assembler_flag, 0) AS INTEGER) AS flg\r\n\u00a0\u00a0\u00a0FROM Foo, Bar\r\n\u00a0\u00a0WHERE Foo.generic_id = Bar.generic_id)\r\n\u00a0\r\nSELECT vague_name, MAX(creation_date) AS recent_creation_date\r\n\u00a0FROM Build_Groups\r\n\u00a0GROUP BY vague_name\r\nHAVING MIN(flg) = MAX(flg);\r\n<\/pre>\n<p>This will give you Albert and Chuck (each has only one flag value), but not Bob (mixed flags). Back to the drawing board! Well, keyboard.<\/p>\n<pre lang:tsql=\"\" theme:none=\"\">WITH Build_Groups\r\nAS\r\n(SELECT Foo.vague_name, Bar.creation_date, \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0CAST (COALESCE (Bar.assembler_flag, 0) AS INTEGER) AS flg\r\n\u00a0\u00a0FROM Foo, Bar\r\n\u00a0WHERE Foo.generic_id = Bar.generic_id)\r\n\u00a0\r\nSELECT vague_name,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0CASE WHEN SUM (flg) = COUNT (flg) THEN 'ones'\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0WHEN SUM (flg) = 0 THEN 'zeros'\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0ELSE 'mixed' END\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0AS flag_class\r\n\u00a0FROM Build_Groupso90io90io90io90ib n\r\n\u00a0GROUP BY vague_name; \r\n<\/pre>\n<p>The result can be summarized like this:<\/p>\n<table class=\"MsoNormalTable\">\n<tbody>\n<tr>\n<td valign=\"top\">\n<p>Albert<\/p>\n<\/td>\n<td valign=\"top\">\n<p>&#8216;2015-01-02&#8217;<\/p>\n<\/td>\n<td valign=\"top\">\n<p>all assembler_flags = 1<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Bob<\/p>\n<\/td>\n<td valign=\"top\">\n<p>&#8216;2015-01-01&#8217;<\/p>\n<\/td>\n<td valign=\"top\">\n<p>all assembler_flags = 0<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>Chuck<\/p>\n<\/td>\n<td valign=\"top\">\n<p>\u00a0&#8216;2015-02-02&#8217;<\/p>\n<\/td>\n<td valign=\"top\">\n<p>mixed assembler_flag = 0<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\u00a0The important thing to notice in this example is that the <code>CASE <\/code>expression <code>WHEN<\/code> clauses have to be at the same level of aggregation as the groups. That is why I am using <code>SUM()<\/code> and <code>COUNT()<\/code> in the when clauses. Trivially, constants like &#8216;mixed&#8217; are considered to be at all levels of aggregation, as are computations done on the aggregates.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>It sometimes pays to go back and look at what you think you already know about SQL. Joe Celko gives a quick revision of the GROUP BY and HAVING clauses in SQL that are the bedrock of any sort of analysis of data, and comes up with some nuggets that may not be entirely obvious&hellip;<\/p>\n","protected":false},"author":96214,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143531],"tags":[4242,4150,4252],"coauthors":[],"class_list":["post-2165","post","type-post","status-publish","format-standard","hentry","category-t-sql-programming-sql-server","tag-basics","tag-sql","tag-t-sql-programming"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/2165","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/96214"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=2165"}],"version-history":[{"count":7,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/2165\/revisions"}],"predecessor-version":[{"id":22861,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/2165\/revisions\/22861"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=2165"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=2165"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=2165"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=2165"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}