Summarizing Values: `GROUP BY` Clause and Aggregate Functions

So far, the examples presented have shown how to retrieve and manipulate values from individual rows in a table. In this section, we will illustrate how summary information can be obtained from groups of rows in a table.

Often we find it useful to group data by some characteristic of the group, such as department or division, or benefit level, so that summary statistics about the group (totals, averages, etc.) can be calculated. For example, to calculate average departmental salaries, the user could group the salaries of all employees by department. In Texis, the GROUP BY clause is used to divide the rows of a table into groups that have matching values in one or more columns. The form of this clause is:

GROUP BY   column-name1 [,column-name2] ...

and it fits into the SELECT expression in the following manner.

SELECT     column-name1 [,column-name2] ...
     FROM       table-name
     [WHERE     search-condition]
     [GROUP BY  column-name1 [,column-name2] ... ]
     [ORDER BY  column-name1 [DESC] [,column-name2] [DESC] ] ... ;

The column(s) listed in the GROUP BY clause are used to form groups. The grouping is based on rows with the same value in the specified column or columns being placed in the same group. It is important to note that grouping is conceptual; the table is not physically rearranged.

As an extension Texis also allows the GROUP BY clause to consist of expressions instead of just column names. This should be used with caution, and the same expression should be used in the SELECT as in the GROUP BY clause. This is especially true if the expression will fold multiple values together, such as dividing a number by 1000 to group quantities together if they are in the same 1000. If you select SALARY, and GROUP BY SALARY/1000 you will see one sample salary from the matching group.

The GROUP BY clause is normally used along with five built-in, or "aggregate" functions. These functions perform special operations on an entire table or on a set, or group, of rows rather than on each row and then return one row of values for each group.

Table here lists the aggregate functions available with Texis.

Function Name	Meaning	Example
SUM(column name)	Total of the values in a numeric column	`SUM(SALARY)`
AVG(column name)	Average of the values in a column	`AVG(SALARY)`
MAX(column name)	Largest value in a column	`MAX(SALARY)`
MIN(column name)	Smallest value in a column	`MIN(SALARY)`
COUNT(*)	Count of the number of rows selected	`COUNT(*)`

Table: Texis Aggregate Function Names

Aggregate functions are used in place of column names in the SELECT statement. The form of the function is:

Function name ([DISTINCT] argument)

In all situations the argument represents the column name to which the function applies. For example, if the sum of all salaries is needed, then the function SUM is used and the argument is the column SALARY. When COUNT is used an asterisk (*) can be placed within the parentheses instead of a column name to count all the rows without regard to field.

If the DISTINCT keyword is used then only the unique values are processed. This is most useful with COUNT to find the number of unique values. If you use DISTINCT then you must supply a column name. DISTINCT will work with the other aggregate functions, although there is typically very little need for them. The DISTINCT feature was added in version 4.00.1002000000

Example: What is the average salary paid in each department?

Enter this statement:

SELECT     DEPT, AVG(SALARY)
     FROM       EMPLOYEE
     GROUP BY   DEPT ;

Syntax Notes:

AVG is the aggregate function name.
(SALARY) is the column on which the average is computed.
DEPT is the column by which the rows will be grouped.

The above statement will produce the following results:

DEPT      AVG(SALARY)

  MKT       33500
  MGT       45000
  LIB       22000
  RND       27500
  FIN       42000

In this query, all rows in the EMPLOYEE table that have the same department codes are grouped together. The aggregate function AVG is calculated for the salary column in each group. The department code and the average departmental salary are displayed for each department.

A SELECT clause that contains an aggregate function cannot contain any column name that does not apply to a group; for example:

The statement:

SELECT     ENAME, AVG(SALARY)
     FROM       EMPLOYEE
     GROUP BY   DEPT ;

results in the message

Error at Line 1: Not a GROUP BY Expression

It is not permissible to include column names in a SELECT clause that are not referenced in the GROUP BY clause. The only column names that can be displayed, along with aggregate functions, must be listed in the GROUP BY clause. Since ENAME is not included in the GROUP BY clause, an error message results.

Example: The chair of the Marketing Department plans to participate in a national salary survey for employees in Marketing Departments. Determine the average salary paid to the Marketing Department employees.

This statement:

SELECT     COUNT(*), AVG(SALARY)
     FROM       EMPLOYEE
     WHERE      DEPT = 'MKT'

Results in:

COUNT(*)   AVG(SALARY)

  2          33500

In this example, the aggregate function AVG is used in a SELECT statement that has a WHERE clause. Texis selects the rows that represent Marketing Department employees and then applies the aggregate function to these rows.

You can divide the rows of a table into groups based on values in more than one column. For example, you might want to compute total salary by department and then, within a department, want subtotals by benefits classification.

Example: What is the total salary paid by benefits classification in each department?

Enter this statement:

SELECT     DEPT, BENEFITS, SUM(SALARY)
     FROM       EMPLOYEE
     GROUP BY   DEPT, BENEFITS ;

In this example, we are grouping by department, and within department, by benefits classification.

We'll get the following results:

DEPT      BENEFITS    SUM(SALARY)

  FIN       FULL        42000
  LIB       PART        22000
  MGT       FULL        45000
  MKT       FULL        67000
  RND       FULL        30000
  RND       PART        25000

In this query, the rows are grouped by department and, within each department, employees with the same benefits are grouped so that totals can be computed. Notice that the columns DEPT and BENEFITS can appear in the SELECT statement since both columns appear in the GROUP BY clause.

If the GROUP BY clause is omitted when an aggregate function is used, then the entire table is considered as one group, and the group function displays a single value for the entire table.

Example: What is the total salary paid to all employees?

The statement:

SELECT     SUM(SALARY)
     FROM       EMPLOYEE ;

results in:

SUM(SALARY)

  231000

Summarizing Values: GROUP BY Clause and Aggregate Functions

Summarizing Values: `GROUP BY` Clause and Aggregate Functions