OK, let’s speed past the easy bits, the parts we all already know: standard aggregate functions. Aggregate functions, unlike regular functions, take values from multiple rows as their input. The category includes those aggregate functions that are so ordinary they’re almost invisible – SUM, COUNT, MAX – and a couple that most of us never use – such as APPROX_COUNT_DISTINCT.
In spite of such arcane functions there are always gaps that need plugging (indeed functions as un-extraordinary as median and mode were only added at 10g). I’m sure the guys at Oracle are all coffee-addled geniuses who work 24 and a half hours every day, but they’ll never be able to add every aggregate function to meet every single need.
User-Defined Aggregate Functions
Which is why, with 9i, the ability to create our own aggregate functions was added to Oracle. Before this, you’d have needed to write PL/SQL to replicate the capabilities of an aggregate function. Now you can create your own aggregate function, and call it directly from your SQL.
However, before we go crazy and dive right in, let’s agree on a problem to solve with our user-defined aggregate function. Let’s imagine that the standard EMP and DEPT tables belong to a small company that we run together, Scott Tiger Enterprises. Let’s also imagine that the government has just introduced a new law stating that the tax our company will pay is equal to the sum of the three largest salaries we pay our employees. An aggregate function will be perfect for determining the figure.
The ODCI Aggregate Routines
To write a user-defined aggregate function, we must have a passing acquaintance with Oracle Data Cartridges. Data cartridges, using the Oracle Extensibility Architecture framework, extend the abilities of the Oracle server. Don’t worry too much about it at this point; just know that we’ll be using an API from the data cartridge (ODCI) arsenal to build our aggregate function.
The 4 ODCI functions we will use are as follows:
- ODCIAggregateInitialize(ctx IN OUT our_aggregate_object_type) RETURN NUMBER
This is the function within which we initialize our persistent variables. It executes once for each distinct member of the group by field (or just once, if your eventual query does not contain a group by). - ODCIAggregateIterate(self IN OUT our_aggregate_object_type, our_parameter_to_aggregate IN some_data_type) RETURN NUMBER
This function executes once for each row returned by the query. This is where the meat of our aggregate function will live. - ODCIAggregateMerge(self IN OUT our_aggregate_object_type, ctx2 IN our_aggregate_object_type) RETURN NUMBER
For our purposes here, let’s just say that the merge function simply returns ODCIconst.success (a numeric constant). There’s more to it, but it’s outside the scope of what we’re talking about today. You can build your aggregate function without knowing it. - ODCIAggregateTerminate(self IN our_aggregate_object_type, returnvalue OUT some_data_type, flags IN NUMBER) RETURN NUMBER
This function executes once for each distinct member of the group by field.
To use these functions, we will need to create an object type to implement them. And then, finally, we will create our user-defined aggregate function using our object.
It might all seem a little overwhelming at first, but, in truth, the steps are simple:
- Create an object spec containing any persistent variables we might need, and declaring ODCIAggregateInitialize, ODCIAggregateIterate, ODCIAggregateMerge and ODCIAggregateTerminate.
- Create an object body containing the bodies of your ODCI functions. Bear in mind that:
- ODCIAggregateInitialize will run once at the start of the process;
- ODCIAggregateIterate runs for each row, and is therefore where you will do all of your work;
- and ODCIAggregateTerminate runs once at the end of the whole process.
- Create your aggregate function based on your object.
Example User-Defined Aggregate Function
An example will probably help. Let’s create an aggregate function, Taxbill, to find out how much Scott Tiger Enterprises will have to pay. Remember it’s the sum of the three highest salaries on our books.
First we need to create our object spec. Let’s call it newTaxFncs
1 2 3 4 5 6 7 8 9 10 11 |
CREATE OR REPLACE TYPE newTaxFncs AS OBJECT ( vTaxBill NUMBER, vHighest NUMBER, v2ndHigh NUMBER, v3rdHigh NUMBER, STATIC FUNCTION ODCIAggregateInitialize (ctx IN OUT newTaxFncs) RETURN NUMBER, MEMBER FUNCTION ODCIAggregateIterate (self IN OUT newTaxFncs, pSal IN NUMBER) RETURN NUMBER, MEMBER FUNCTION ODCIAggregateMerge(self IN OUT newTaxFncs, ctx2 newTaxFncs) RETURN NUMBER, MEMBER FUNCTION ODCIAggregateTerminate (self IN newTaxFncs, taxVal OUT NUMBER, flags IN NUMBER) RETURN NUMBER ); |
Now we need to write our object type body. Remember that ODCIAggregateInitialize initializes our variables, vTaxBill, vHighest, v2ndHigh and v3rdHigh; ODCIAggregateIterate does all the legwork; and ODCIAggregateTerminate returns the final figure.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
CREATE OR REPLACE TYPE BODY newTaxFncs IS STATIC FUNCTION ODCIAggregateInitialize (ctx IN OUT newTaxFncs) RETURN NUMBER IS BEGIN -- Initialize our variables: vTaxBill, vHighest, v2ndHigh, v3rdHigh ctx := newTaxFncs(0,0,0,0); RETURN ODCIConst.success; END ODCIAggregateInitialize; MEMBER FUNCTION ODCIAggregateIterate (self IN OUT newTaxFncs, pSal IN NUMBER) RETURN NUMBER IS BEGIN -- If this is one of the top 3 salaries, save it in one of the variables. IF pSal >= self.vHighest THEN self.v3rdHigh := self.v2ndHigh; self.v2ndHigh := self.vHighest; self.vHighest := pSal; ELSIF pSal >= self.v2ndHigh THEN self.v3rdHigh := self.v2ndHigh; self.v2ndHigh := pSal; ELSIF pSal >= self.v3rdHigh THEN self.v3rdHigh := pSal; END IF; RETURN ODCIconst.success; END ODCIAggregateIterate; MEMBER FUNCTION ODCIAggregateMerge (self IN OUT newTaxFncs, ctx2 IN newTaxFncs) RETURN NUMBER IS BEGIN RETURN ODCIconst.success; END ODCIAggregateMerge; MEMBER FUNCTION ODCIAggregateTerminate (self IN newTaxFncs, taxVal OUT NUMBER, flags IN NUMBER) RETURN NUMBER IS BEGIN -- The variable returned OUT by Terminate is what the aggregate function will use. taxVal := self.vHighest + self.v2ndHigh + self.v3rdHigh; RETURN ODCIconst.success; END ODCIAggregateTerminate; END; / |
Finally, we need to create our aggregate function using the object that we’ve just created. Pay attention to the syntax ‘cos it’s slightly different from that of a standard function.
1 2 3 |
CREATE OR REPLACE FUNCTION taxbill (pSal NUMBER) AS "Taxbill" RETURN NUMBER AGGREGATE USING newTaxFncs; |
Now we can use it. Just like any other aggregate function. How cool is that?
1 2 |
SELECT taxbill(sal) FROM emp; |
Taxbill |
11000 |
1 2 3 |
SELECT deptno, taxbill(sal) AS "Taxbill" FROM emp GROUP BY deptno; |
DEPTNO | Taxbill |
10 | 8750 |
20 | 8975 |
30 | 5950 |
Finally
I feel there are a couple of things I need to mention before we move on.
- Our example user-defined aggregate function was number-based, but Oracle allows them accept VARCHAR2, dates, and other scalar data types, and complex ones too.
- Our object type is built on PL/SQL; however, we could have used any supported language, such as C, C++ or Java. I’m illiterate in those languages, so it was an easy decision for me.
Analytic Functions
Remember back in the day when all new televisions had stickers that read HD-ready plastered all over them? Well, I feel that user-defined aggregate functions should come with an “Analytic Function-ready” sticker on them. That is because you can use any aggregate functions you build as analytic functions.
Imagine we wanted to see how our tax bill has changed over the years with each new member of staff.
1 2 3 |
SELECT ename, job, hiredate, sal, taxbill(sal) OVER (order by hiredate) AS "Taxbill" FROM emp ORDER BY hiredate; |
ENAME | JOB | HIREDATE | SAL | Taxbill |
SMITH | CLERK | 12/17/1980 | 800 | 800 |
ALLEN | SALESMAN | 02/20/1981 | 1600 | 2400 |
WARD | SALESMAN | 02/22/1981 | 1250 | 3650 |
JONES | MANAGER | 04/02/1981 | 2975 | 5825 |
BLAKE | MANAGER | 05/01/1981 | 2850 | 7425 |
CLARK | MANAGER | 06/09/1981 | 2450 | 8275 |
TURNER | SALESMAN | 09/08/1981 | 1500 | 8275 |
MARTIN | SALESMAN | 09/28/1981 | 1250 | 8275 |
KING | PRESIDENT | 11/17/1981 | 5000 | 10825 |
FORD | ANALYST | 12/03/1981 | 3000 | 10975 |
Or imagine we wanted to pay our tax department by department.
1 2 3 |
SELECT ename, job, sal, deptno, taxbill(sal) OVER (PARTITION BY deptno) AS "Taxbill" FROM emp ORDER BY deptno; |
ENAME | JOB | SAL | DEPTNO | Taxbill |
CLARK | MANAGER | 2450 | 10 | 8750 |
MILLER | CLERK | 1300 | 10 | 8750 |
KING | PRESIDENT | 5000 | 10 | 8750 |
FORD | ANALYST | 3000 | 20 | 8975 |
SCOTT | ANALYST | 3000 | 20 | 8975 |
JONES | MANAGER | 2975 | 20 | 8975 |
SMITH | CLERK | 800 | 20 | 8975 |
ADAMS | CLERK | 1100 | 20 | 8975 |
WARD | SALESMAN | 1250 | 30 | 5950 |
MARTIN | SALESMAN | 1250 | 30 | 5950 |
Conclusion
So that’s it.
Once created, your aggregate functions will slide in right beside Oracle’s standard ones, and enjoy all the advantages – and disadvantages – that they have. Try running one against an unindexed column in a very large table, and you’ll suffer a performance hit. Unsurprisingly. Use one to condense a very complicated process into a single line of SQL, and you’ll enjoy a readability bonus. Again, unsurprisingly.
So, go forth, and use your new knowledge wisely.
Load comments