SQL Server now has the ANSI/ISO Standard CREATE SEQUENCE
statement. Hooray! The bad news is that most programmers will not be aware of the significance of the underlying math. A SEQUENCE
is not an IDENTITY
. That propriety feature in T-SQL is a table property inherited from the old Sybase/UNIX days. It counts the number of insertion attempts, not even successes, on one particular machine. This goes back to how files were managed on UNIX systems in the 1970s. In those systems, you needed record numbers to locate the data, so T-SQL exposed this count as IDENTITY
. Also, when a record was deleted, a bit flag on each record might be set to signal a garbage collection routine to remove this data during a file utility clean up.
Obviously, IDENTITY
increments its value from physical insertion to physical insertion; no big jumps, no gaps. Again, IDENTITY
is not a column; it is a table property. That means it cannot be a key by definition. Remember a key is a subset of columns that are unique in every row of a table. It is not an integer; you cannot do meaningful math on it. It is non-deterministic. If you insert a set of more than one row into a table, the rows will be inserted one at time in an order determined by the indexes, internal state of the hardware and number of sessions open at the time.
We have, for a long time, created a table of integers that has all sorts of uses. It starts at 1 and goes to some sufficiently large value. Its name depends on the author of the table. You will see “Numbers”, “Sequence”, “Tally_Table” and “Series” among others. Obviously, we can no longer use “Sequence”; it is a reserved word! I used it myself, and so I’ll have to re-write my books from now on. Darn. I always thought that “Numbers” was too generic; which numbers?? I never liked “Tally”, since a tally is a current score or amount, a running total, or count. It is not a set, but a process. I settled on “Series” for my stuff since a series is a completed (infinite) set, but it is not the right word.
Now, let’s do math.
What is the Difference Between a Sequence and a Series?
A sequence is a list of numbers. The order in which the numbers are listed is important, so for instance {1, 2, 3, 4, 5, …} is one sequence, and {2, 1, 4, 3, 6, 5, …} is an entirely different sequence. This is important. It means we can talk about the i-th member of the sequence.
A series is a sum of numbers. A classic example is: 1 + 1/2 + 1/4 + 1/8 + 1/16 + … which converge to 2 (or you can loosely say it will equal 2 at infinite). The order of terms can change the limit to which the series converges or it might not (“absolutely convergent”). Okay, too much math for a database article.
Sequences and series are closely related to each other. But a sequence is not summed.
Tag Numbers versus Sequence
A tag number is string, usually of fixed length, made up of digits. The digits are in sequence to make them easy to sort and to generate. Think of a serial number on a manufactured product. The most common example is a vehicle identification number (VIN). This is a unique code is used by the automotive industry to identify individual motor vehicles, as defined in ISO 3833. The positions 12 to 17 are a sequence prefixed by codes for the Manufacturer Identifier, some vehicle attributes, a check digit, model year, and plant code.
The check digit is a major reason we like tag numbers. This is a topic in itself, but the idea is that we can take each digit in the tag number and run it thru a formula as an integer. We get a result, usually another digit, and we attach it to the tag number. For example, the Luhn algorithm is a common method
defined by ISO/IEC 7812-1. We like it because it is simple to put in hardware.
1. Compute the sum of the digits .
2. Take the units digit from this total.
3. Subtract it from 10.
It’s not a great check digit, but it catches most of the common input errors – missing digits, extra digits, wrong digits and pairwise transposes.
Now let’s into the SQL.
The CREATE SEQUENCE Statement
This statement creates a schema level object that is accessible to any user. It is not part of a table. It is not a procedure that belong to one user. The idea is that a user can invoke the SEQUENCE
with a special syntax that is used wherever an integer value of the SEQUENCE
data type would work.
If you want a physical model, imagine you are in the butcher store. You walk in and pull a service ticket number from a roll of tickets on the counter. Sequence numbers are generated outside the scope of the current transaction, just like the tickets. The numbers are consumed whether the transaction using the ticket number is actually served (committed) or they walk out (rolled back).
1 2 |
NEXT VALUE FOR [<database_name>.] [<schema_name>.] <sequence_name> [OVER (<over_order_by_clause>)] |
I will explain the OVER()
clause later; just keep the simple butcher store model in mind for now. Let’s look at the roll of tickets (i.e. CREATE SEQUENCE
). It has to be declared with parameters that define behavior.
1 2 3 4 5 6 7 8 9 10 |
CREATE SEQUENCE [<schema_name>.] <sequence_name> [AS [built_in_integer_type | user-defined_integer_type]] [START WITH <constant>] [INCREMENT BY <constant>] [{MINVALUE [<constant>]} | {NO MINVALUE}] [{MAXVALUE [<constant>]} | {NO MAXVALUE}] [CYCLE | {NO CYCLE}] [{CACHE [<constant>]} | {NO CACHE}] [;] |
Let’s go thru the BNF in detail. The <sequence_name>
and its qualifiers explains itself. This is pure SQL. Sequence numbers are generated outside the scope of the current transaction. They are consumed whether the transaction using the sequence number is committed or rolled back. Think about the butcher shop ticket that gets dropped on the floor.
A sequence can be defined as any integer type: That means TINYINT, SMALLINT, INTEGER and BIGINT;
but it also allows DECIMAL (s, 0)
and NUMERIC(s, 0)
data types. You can also a user-defined data type that is based on one of the allowed types. Do not do that if you value portability.
If no data type is provided, BIGINT
is the default. Do not use this unless you really need a range of -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807), which makes ‘Big Data’ seem like note on the back of the hand. As with most defaults, this is the largest, safest value in the problem space.
1 |
START WITH <constant> |
This is the first value returned by the sequence. The START value must be a value between the minimum and maximum values. The default start value for a new sequence is the minimum value for an ascending sequence and the maximum value for a descending sequence. Note this is a constant; no function calls.
1 |
INCREMENT BY <constant> |
Value used to increment (or decrement if negative) the value of the sequence for each call to the NEXT VALUE FOR
. If the increment is a negative value, the sequence is descending; otherwise, it is ascending. The increment cannot be 0, obviously. The default increment for a new sequence is 1. Again, this is a constant and not a function call.
1 |
[MINVALUE <constant> | NO MINVALUE] |
Specifies the bounds for the sequence. The default minimum value for a new sequence is the minimum value of the data type of the sequence. This is zero for the TINYINT
data type and a negative number for all other data types.
NO MINVALUE
is probably a really bad design choice . You probably wanted zero or one.
1 |
[MAXVALUE <constant> | NO MAXVALUE |
Specifies the bounds for the sequence. The default maximum value for a new sequence is the maximum value of the data type of the sequence.
NO MAXVALUE
is probably another bad design. If you are going to use this to create a tag number, you have to worry about overflowing the length of your string.
If you are casting the numeric data type to strings for tag numbers, you will want to be sure that the numbers do not overflow in the conversion. And you probably do not need the upper limit of a data type.
1 |
[CYCLE | NO CYCLE] |
Property that specifies whether the sequence should restart from the minimum value (or maximum for descending sequences) or throw an exception when its minimum or maximum value is exceeded. The default cycle option for new sequences is NO CYCLE
. Note that cycling restarts from the minimum or maximum value, not from the start value.
1 |
[CACHE [<constant>] | NO CACHE] |
I am not going to explain Cache management. This is an implementation detail and I do programming.
In many financial applications, you cannot afford gaps. For example in Italy, invoice numbers have to be in a strict sequence by law. But you can lose all of the numbers in cache in the event of a major failure, so while it is faster, it is also dangerous. Remember that today “ROI” means “Risk of Incarceration”
OVER (ORDER BY ..)
Clause
Remember earlier in this article when I mentioned that how IDENTITY
in assigned to each element of a a set insertion was not deterministic? SEQUENCE
has a way to fix this problem; you can order the sequence! If another process is accessing the sequence object at the same time, the numbers returned could have gaps.
An OVER
clause applied to the NEXT VALUE FOR
function does not support the PARTITION BY
or the [ROW | RANGE]
subclauses for obvious reasons. The following additional rules apply when using the NEXT VALUE FOR
function with the OVER
clause:
Multiple calls to the NEXT VALUE FOR
function for the same sequence generator in a single statement must all use the same OVER clause definition. Again, this is obvious.
Multiple calls to the NEXT VALUE FOR
function that reference different sequence generators in a single statement can have different OVER clause definitions. They are separate schema objects.
If all calls to the NEXT VALUE FOR
function in a SELECT
statement specifies the OVER
clause, an ORDER BY
clause may be used in the SELECT
statement. Again, the SEQUENCE
is a separate schema object.
The OVER
clause is allowed with the NEXT VALUE FOR
function when used in a SELECT
statement or INSERT ... SELECT ...
statement. The NEXT VALUE FOR
function is not allowed in UPDATE
or MERGE
statements.
Programming Examples
To get a feel for how this works, create a simple sequence and play with it.
1 2 3 4 5 6 7 8 |
CREATE SEQUENCE Invoice_Seq AS INTEGER START WITH 1 INCREMENT BY 1 MINVALUE 1 MAXVALUE 99 NO CYCLE; |
Now, just play with it. Just hit “SELECT NEXT VALUE FOR Invoice_Seq;
” a few times, or in SSMS execute …
1 2 3 4 |
SELECT NEXT VALUE FOR Invoice_Seq; GO 99 SELECT NEXT VALUE FOR Invoice_Seq; |
After you hit 99 and then invoke the next value, you will get this message.
1 2 3 4 |
Msg 11728, Level 16, State 1, Line 1 The sequence object 'Invoice_Seq' has reached its minimum or maximum value. Restart the sequence object to allow new values to be generated. |
We did not talk about this “restart” option yet. It is a straightforward ALTER
statement.
1 |
ALTER SEQUENCE <sequence name> RESTART [WITH <constant>]; |
The WITH
option lets you assign the new starting value. If you do not give a specific value, the default is the original starting value. In this example, we can use:
1 2 |
ALTER SEQUENCE Invoice_Seq RESTART WITH 3; SELECT NEXT VALUE FOR Invoice_Seq; |
this will return 3. Most of the situations where you want to re-start a sequence can be done with the CYCLE
clause.
The use of the ORDER BY
clause can let you build groupings. Create a sequence groups of ten:
1 2 3 4 5 6 7 |
CREATE SEQUENCE Ten_Seq AS INTEGER START WITH 1 INCREMENT BY 1 MINVALUE 1 MAXVALUE 10 CYCLE; |
Now use it in a query that breaks the students into groups of ten
1 2 3 4 |
SELECT student_id, student_name, (NEXT VALUE FOR Ten_Seq OVER(ORDER BY student_id)) AS student_grouping FROM Students; |
Converting the sequence into, say, a four digit invoice serial number is pretty simple. It is important to use VARCHAR(n).
SELECT ‘IN:’ + RIGHT (‘0000’ + CAST (NEXT VALUE FOR Invoice_Seq AS VARCHAR(5)), 4);
But a fun example is that of using the same sequence in more than one table. Let’s go back the original service ticket example. When you come into the shop, you pull a ticket.
1 2 3 4 5 6 7 8 |
CREATE SEQUENCE Service_Ticket_Seq AS INTEGER START WITH 1 INCREMENT BY 1 MINVALUE 1 MAXVALUE 100 CYCLE; |
We have two departments in this delicatessen, meats and fish. If you have been to a Kosher market, you know that these two areas are kept separate. A ticket can be used in only one department.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
CREATE TABLE Meats (ticket_seq INTEGER NOT NULL PRIMARY KEY, meat_type VARCHAR(15) NOT NULL); CREATE TABLE Fish (ticket_seq INTEGER NOT NULL PRIMARY KEY, fish_type VARCHAR(15) NOT NULL); CREATE PROCEDURE Ticket_Service (@in_market_code CHAR(1), @in_product_name VARCHAR(15)) AS BEGIN DECLARE @local_ticket_seq INTEGER; SET @local_ticket_seq = NEXT VALUE FOR Service_Ticket_Seq; IF @in_market_code = 'M' INSERT INTO Meats VALUES (@local_ticket_seq, @in_product_name); ELSE IF @in_market_code = 'F' INSERT INTO Fish VALUES (@local_ticket_seq, @in_product_name); END; |
Now, let’ call the procedure a few times:
1 2 3 4 5 6 |
EXEC Ticket_Service 'M', 'Brisket'; EXEC Ticket_Service 'F', 'Lox'; EXEC Ticket_Service 'M', 'Chicken'; EXEC Ticket_Service 'M', 'Brisket'; EXEC Ticket_Service 'F', 'Sturgeon'; EXEC Ticket_Service 'F', 'Haddock'; |
And now let’s see how this works.
1 |
SELECT * FROM Meats; |
1 |
Brisket |
3 |
Chicken |
4 |
Brisket |
1 |
SELECT * FROM Fish; |
2 |
Lox |
5 |
Sturgeon |
6 |
Haddock |
If I UNION
the two tables, I get a complete sequence and I am sure their intersection is empty. Doing this without a SEQUENCE
is a lot harder. But I did resort to if-then-else flow control because I have two different tables. UGH!
I can put a SEQUENCE
in the DEFAULT
clause of the DDL for table:
1 2 3 4 5 |
CREATE TABLE Service_Tickets (ticket_nbr INTEGER DEFAULT NEXT VALUE FOR Service_Ticket_Seq, department_code CHAR(1) NOT NULL CHECK (department_code IN ('M', 'F'))); |
Now play with this code.
1 2 3 4 5 |
INSERT INTO Service_Tickets (department_code) VALUES ('M'); SELECT * FROM Service_Tickets; |
That is cool! But do not stop here. Let’s re-do the Meats and Fish tables with this feature, because we can then do inserts without the need for a procedure and still be sure of an empty intersection:
1 2 3 4 5 6 7 8 9 |
CREATE TABLE Meats (ticket_seq INTEGER DEFAULT NEXT VALUE FOR Service_Ticket_Seq PRIMARY KEY, meat_type VARCHAR(15) NOT NULL); CREATE TABLE Fish (ticket_seq INTEGER DEFAULT NEXT VALUE FOR Service_Ticket_Seq PRIMARY KEY, fish_type VARCHAR(15) NOT NULL); |
Now, to get you started, try these statements. I dropped the kosher theme 🙂
1 2 3 4 |
INSERT INTO Meats (meat_type) VALUES ('pig'); INSERT INTO Fish (fish_type) VALUES ('squid'); SELECT * FROM Meats; SELECT * FROM Fish; |
There are no guarantees as to how the sequence numbers will be assigned; it is pretty much first-come, first-served in the system. Did you notice that the sequences are the PRIMARY KEY
?
1 |
INSERT INTO Meats VALUES (8, 'Cat'); |
will give us:
1 2 3 |
Msg 2627, Level 14, State 1, Line 1 Violation of PRIMARY KEY constraint 'PK__Meats__467BFD06C5684404'. Cannot insert duplicate key in object 'dbo.Meats'. The duplicate key value is (8). The statement has been terminated. |
And this is all declarative code!
Load comments