# Introduction to User-Defined Aggregate Functions

OK, let’s speed past the easy bits, the parts we all already know: standard aggregate functions. Aggregate functions, unlike regular functions, take values from multiple rows as their input. The category includes those aggregate functions that are so ordinary they’re almost invisible – SUM, COUNT, MAX – and a couple that most of us never use – such as APPROX_COUNT_DISTINCT

OK, let’s speed past the easy bits, the parts we all already know: standard aggregate functions. Aggregate functions, unlike regular functions, take values from multiple rows as their input. The category includes those aggregate functions that are so ordinary they’re almost invisible – SUM, COUNT, MAX – and a couple that most of us never use – such as APPROX_COUNT_DISTINCT.

In spite of such arcane functions there are always gaps that need plugging (indeed functions as un-extraordinary as median and mode were only added at 10g). I’m sure the guys at Oracle are all coffee-addled geniuses who work 24 and a half hours every day, but they’ll never be able to add every aggregate function to meet every single need.

##### User-Defined Aggregate Functions

Which is why, with 9i, the ability to create our own aggregate functions was added to Oracle. Before this, you’d have needed to write PL/SQL to replicate the capabilities of an aggregate function. Now you can create your own aggregate function, and call it directly from your SQL.

However, before we go crazy and dive right in, let’s agree on a problem to solve with our user-defined aggregate function. Let’s imagine that the standard EMP and DEPT tables belong to a small company that we run together, Scott Tiger Enterprises. Let’s also imagine that the government has just introduced a new law stating that the tax our company will pay is equal to the sum of the three largest salaries we pay our employees. An aggregate function will be perfect for determining the figure.

##### The ODCI Aggregate Routines

To write a user-defined aggregate function, we must have a passing acquaintance with Oracle Data Cartridges. Data cartridges, using the Oracle Extensibility Architecture framework, extend the abilities of the Oracle server.  Don’t worry too much about it at this point; just know that we’ll be using an API from the data cartridge (ODCI) arsenal to build our aggregate function.

The 4 ODCI functions we will use are as follows:

• ODCIAggregateInitialize(ctx IN OUT our_aggregate_object_type) RETURN NUMBER
This is the function within which we initialize our persistent variables.  It executes once for each distinct member of the group by field (or just once, if your eventual query does not contain a group by).
• ODCIAggregateIterate(self IN OUT our_aggregate_object_type, our_parameter_to_aggregate IN some_data_type) RETURN NUMBER
This function executes once for each row returned by the query. This is where the meat of our aggregate function will live.
• ODCIAggregateMerge(self IN OUT our_aggregate_object_type, ctx2 IN our_aggregate_object_type) RETURN NUMBER
For our purposes here, let’s just say that the merge function simply returns ODCIconst.success (a numeric constant). There’s more to it, but it’s outside the scope of what we’re talking about today. You can build your aggregate function without knowing it.
• ODCIAggregateTerminate(self IN our_aggregate_object_type, returnvalue OUT some_data_type, flags IN NUMBER) RETURN NUMBER
This function executes once for each distinct member of the group by field.

To use these functions, we will need to create an object type to implement them.  And then, finally, we will create our user-defined aggregate function using our object.

It might all seem a little overwhelming at first, but, in truth, the steps are simple:

1. Create an object spec containing any persistent variables we might need, and declaring ODCIAggregateInitialize, ODCIAggregateIterate, ODCIAggregateMerge and ODCIAggregateTerminate.
2. Create an object body containing the bodies of your ODCI functions. Bear in mind that:
• ODCIAggregateInitialize will run once at the start of the process;
• ODCIAggregateIterate runs for each row, and is therefore where you will do all of your work;
• and ODCIAggregateTerminate runs once at the end of the whole process.
##### Example User-Defined Aggregate Function

An example will probably help. Let’s create an aggregate function, Taxbill, to find out how much  Scott Tiger Enterprises will have to pay. Remember it’s the sum of the three highest salaries on our books.

First we need to create our object spec. Let’s call it newTaxFncs

Now we need to write our object type body. Remember that ODCIAggregateInitialize initializes our variables, vTaxBill, vHighest, v2ndHigh and v3rdHigh; ODCIAggregateIterate does all the legwork; and ODCIAggregateTerminate returns the final figure.

Finally, we need to create our aggregate function using the object that we’ve just created. Pay attention to the syntax ‘cos it’s slightly different from that of a standard function.

Now we can use it. Just like any other aggregate function. How cool is that?

 Taxbill 11000

 DEPTNO Taxbill 10 8750 20 8975 30 5950
##### Finally

I feel there are a couple of things I need to mention before we move on.

1. Our example user-defined aggregate function was number-based, but Oracle allows them accept VARCHAR2, dates, and other scalar data types, and complex ones too.
2. Our object type is built on PL/SQL; however, we could have used any supported language, such as C, C++ or Java. I’m illiterate in those languages, so it was an easy decision for me.
##### Analytic Functions

Remember back in the day when all new televisions had stickers that read HD-ready plastered all over them? Well, I feel that user-defined aggregate functions should come with an “Analytic Function-ready” sticker on them. That is because you can use any aggregate functions you build as analytic functions.

Imagine we wanted to see how our tax bill has changed over the years with each new member of staff.

 ENAME JOB HIREDATE SAL Taxbill SMITH CLERK 12/17/1980 800 800 ALLEN SALESMAN 02/20/1981 1600 2400 WARD SALESMAN 02/22/1981 1250 3650 JONES MANAGER 04/02/1981 2975 5825 BLAKE MANAGER 05/01/1981 2850 7425 CLARK MANAGER 06/09/1981 2450 8275 TURNER SALESMAN 09/08/1981 1500 8275 MARTIN SALESMAN 09/28/1981 1250 8275 KING PRESIDENT 11/17/1981 5000 10825 FORD ANALYST 12/03/1981 3000 10975

Or imagine we wanted to pay our tax department by department.

 ENAME JOB SAL DEPTNO Taxbill CLARK MANAGER 2450 10 8750 MILLER CLERK 1300 10 8750 KING PRESIDENT 5000 10 8750 FORD ANALYST 3000 20 8975 SCOTT ANALYST 3000 20 8975 JONES MANAGER 2975 20 8975 SMITH CLERK 800 20 8975 ADAMS CLERK 1100 20 8975 WARD SALESMAN 1250 30 5950 MARTIN SALESMAN 1250 30 5950
##### Conclusion

So that’s it.

Once created, your aggregate functions will slide in right beside Oracle’s standard ones, and enjoy all the advantages – and disadvantages – that they have.  Try running one against an unindexed column in a very large table, and you’ll suffer a performance hit. Unsurprisingly.  Use one to condense a very complicated process into a single line of SQL, and you’ll enjoy a readability bonus. Again, unsurprisingly.

So, go forth, and use your new knowledge wisely.