# Summarizing Values: GROUP BY Clause and Aggregate Functions

So far, the examples presented have shown how to retrieve and manipulate values from individual rows in a table. In this section, we will illustrate how summary information can be obtained from groups of rows in a table.

Often we find it useful to group data by some characteristic of the group, such as department or division, or benefit level, so that summary statistics about the group (totals, averages, etc.) can be calculated. For example, to calculate average departmental salaries, the user could group the salaries of all employees by department. In Texis, the GROUP BY clause is used to divide the rows of a table into groups that have matching values in one or more columns. The form of this clause is:

`GROUP BY   column-name1 [,column-name2] ...`
and it fits into the `SELECT` expression in the following manner.
```SELECT     column-name1 [,column-name2] ...
FROM       table-name
[WHERE     search-condition]
[GROUP BY  column-name1 [,column-name2] ... ]
[ORDER BY  column-name1 [DESC] [,column-name2] [DESC] ] ... ;```

The column(s) listed in the GROUP BY clause are used to form groups. The grouping is based on rows with the same value in the specified column or columns being placed in the same group. It is important to note that grouping is conceptual; the table is not physically rearranged.

As an extension Texis also allows the GROUP BY clause to consist of expressions instead of just column names. This should be used with caution, and the same expression should be used in the `SELECT` as in the GROUP BY clause. This is especially true if the expression will fold multiple values together, such as dividing a number by 1000 to group quantities together if they are in the same 1000. If you select SALARY, and GROUP BY SALARY/1000 you will see one sample salary from the matching group.

The GROUP BY clause is normally used along with five built-in, or "aggregate" functions. These functions perform special operations on an entire table or on a set, or group, of rows rather than on each row and then return one row of values for each group.

Table here lists the aggregate functions available with Texis.

 Function Name Meaning Example SUM(column name) Total of the values in a numeric column `SUM(SALARY)` AVG(column name) Average of the values in a column `AVG(SALARY)` MAX(column name) Largest value in a column `MAX(SALARY)` MIN(column name) Smallest value in a column `MIN(SALARY)` COUNT(*) Count of the number of rows selected `COUNT(*)`

Table: Texis Aggregate Function Names

Aggregate functions are used in place of column names in the `SELECT` statement. The form of the function is:

`Function name ([DISTINCT] argument)`

In all situations the argument represents the column name to which the function applies. For example, if the sum of all salaries is needed, then the function SUM is used and the argument is the column SALARY. When COUNT is used an asterisk (*) can be placed within the parentheses instead of a column name to count all the rows without regard to field.

If the DISTINCT keyword is used then only the unique values are processed. This is most useful with COUNT to find the number of unique values. If you use DISTINCT then you must supply a column name. DISTINCT will work with the other aggregate functions, although there is typically very little need for them. The DISTINCT feature was added in version 4.00.1002000000

Example: What is the average salary paid in each department?

Enter this statement:

```SELECT     DEPT, AVG(SALARY)
FROM       EMPLOYEE
GROUP BY   DEPT ;```
Syntax Notes:

• `AVG` is the aggregate function name.

• `(SALARY)` is the column on which the average is computed.

• `DEPT` is the column by which the rows will be grouped.

The above statement will produce the following results:

```DEPT      AVG(SALARY)

MKT       33500
MGT       45000
LIB       22000
RND       27500
FIN       42000```

In this query, all rows in the `EMPLOYEE` table that have the same department codes are grouped together. The aggregate function AVG is calculated for the salary column in each group. The department code and the average departmental salary are displayed for each department.

A `SELECT` clause that contains an aggregate function cannot contain any column name that does not apply to a group; for example:

The statement:

```SELECT     ENAME, AVG(SALARY)
FROM       EMPLOYEE
GROUP BY   DEPT ;```
results in the message
`Error at Line 1: Not a GROUP BY Expression`

It is not permissible to include column names in a `SELECT` clause that are not referenced in the GROUP BY clause. The only column names that can be displayed, along with aggregate functions, must be listed in the GROUP BY clause. Since `ENAME` is not included in the GROUP BY clause, an error message results.

Example: The chair of the Marketing Department plans to participate in a national salary survey for employees in Marketing Departments. Determine the average salary paid to the Marketing Department employees.

This statement:

```SELECT     COUNT(*), AVG(SALARY)
FROM       EMPLOYEE
WHERE      DEPT = 'MKT'```
Results in:

```COUNT(*)   AVG(SALARY)

2          33500```

In this example, the aggregate function AVG is used in a `SELECT` statement that has a `WHERE` clause. Texis selects the rows that represent Marketing Department employees and then applies the aggregate function to these rows.

You can divide the rows of a table into groups based on values in more than one column. For example, you might want to compute total salary by department and then, within a department, want subtotals by benefits classification.

Example: What is the total salary paid by benefits classification in each department?

Enter this statement:

```SELECT     DEPT, BENEFITS, SUM(SALARY)
FROM       EMPLOYEE
GROUP BY   DEPT, BENEFITS ;```
In this example, we are grouping by department, and within department, by benefits classification.

We'll get the following results:

```DEPT      BENEFITS    SUM(SALARY)

FIN       FULL        42000
LIB       PART        22000
MGT       FULL        45000
MKT       FULL        67000
RND       FULL        30000
RND       PART        25000```

In this query, the rows are grouped by department and, within each department, employees with the same benefits are grouped so that totals can be computed. Notice that the columns DEPT and BENEFITS can appear in the `SELECT` statement since both columns appear in the GROUP BY clause.

If the GROUP BY clause is omitted when an aggregate function is used, then the entire table is considered as one group, and the group function displays a single value for the entire table.

Example: What is the total salary paid to all employees?

The statement:

```SELECT     SUM(SALARY)
FROM       EMPLOYEE ;```
results in:

```SUM(SALARY)

231000```