January 25, 2021
Hot Topics:

Introductory Transact-SQL

  • By Addison Wesley
  • Send Email »
  • More Articles »

Grouping Data

Since SQL is a set-oriented query language, statements that group or summarize data are its bread and butter. In conjunction with aggregate functions, they are the means by which the real work of SQL queries is performed. Developers familiar with DBMS products that lean more toward single-record handling find this peculiar because they are accustomed to working with data one row at a time. Generating summary information by looping through a table is a common technique in older database products—but not in SQL Server. A single SQL statement can perform tasks that used to require an entire COBOL program to complete. This magic is performed using SELECT's GROUP BY clause and Transact-SQL aggregate functions. Here's an example:

SELECT customers.CustomerNumber,SUM(orders.Amount) AS TotalOrdersFROM customers JOIN ordersON customers.CustomerNumber=orders.CustomerNumberGROUP BY customers.CustomerNumber

This query returns a list of all customers and the total amount of each customer's orders.

How do you know which fields to include in the GROUP BY clause? You must include all the items in the SELECT statement's column list that are not aggregate functions or absolute values. Take the following SELECT statement:

-- Bad SQL - don't do thisSELECT customers.CustomerNumber, customers.LastName,SUM(orders.Amount) AS TotalOrdersFROM customers JOIN ordersON customers.CustomerNumber=orders.CustomerNumberGROUP BY customers.CustomerNumber

This query won't execute because it's missing a column in the GROUP BY clause. Instead, it should read:

GROUP BY customers.CustomerNumber, customers.LastName

Note that the addition of the LastName column doesn't really affect the results since CustomerNumber is a unique key. That is, including LastName as a GROUP BY column won't cause any additional grouping levels to be produced since there is only one LastName for each CustomerNumber.


The HAVING clause is used to limit the rows returned by a SELECT with GROUP BY. Its relationship to GROUP BY is similar to the relationship between the WHERE clause and the SELECT itself. Like the WHERE clause, it restricts the rows returned by a SELECT statement. Unlike WHERE, it operates on the rows in the result set rather than the rows in the query's tables. Here's the previous query modified to include a HAVING clause:

SELECT customers.CustomerNumber,       customers.LastName, SUM(orders.Amount)AS TotalOrdersFROM customers JOIN ordersON customers.CustomerNumber=orders.CustomerNumberGROUP BY customers.CustomerNumber, customers.LastNameHAVING SUM(orders.Amount) > 700CustomerNumber LastName TotalOrders-------------- -------- -----------3              Citizen  86753.091              Doe      802.35

There is often a better way of qualifying a query than by using a HAVING clause. In general, HAVING is less efficient than WHERE because it qualifies the result set after it's been organized into groups; WHERE does so beforehand. Here's an example that improperly uses the HAVING clause:

-- Bad SQL - don't do thisSELECT customers.LastName, COUNT(*) AS NumberWithNameFROM customersGROUP BY customers.LastNameHAVING customers.LastName<>'Citizen'

Properly written, this query's filter criteria should be in its WHERE clause, like so:

SELECT customers.LastName, COUNT(*) AS NumberWithNameFROM customersWHERE customers.LastName<> 'Citizen'GROUP BY customers.LastName

In fact, SQL Server recognizes this type of HAVING misuse and translates HAVING into WHERE during query execution. Regardless of whether SQL Server catches errors like these, it's always better to write optimal code in the first place.

Ordering Data

The ORDER BY clause is used to order the rows returned by a query. It follows the WHERE and GROUP BY clauses (if they exist) and sorts the result set just prior to returning it. Here's an example:

SELECT LastName, StateFROM customersORDER BY State

Here's another example:

SELECT FirstName, LastNameFROM customersORDER BY LastName DESC

Note the use of the DESC keyword to sort the rows in descending order. If not directed otherwise, ORDER BY always sorts in ascending order.

Page 8 of 10

This article was originally published on May 7, 2003

Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Thanks for your registration, follow us on our social networks to keep up-to-date