SQL Server SEQUENCE Basics

The SEQUENCE statement introduced in SQL Server 2012 brings the ANSI SQL 2003 standard method of generating IDs. This is a great relief to database professionals as it solves some problems what are awkward to solve with the IDENTITY property. Joe Celko explains the basics of using a SEQUENCE

SQL Server now has the ANSI/ISO Standard CREATE SEQUENCE statement. Hooray! The bad news is that most programmers will not be aware of the significance of the underlying math. A SEQUENCE is not an IDENTITY. That propriety feature in T-SQL is a table property inherited from the old Sybase/UNIX days. It counts the number of insertion attempts, not even successes, on one particular machine. This goes back to how files were managed on UNIX systems in the 1970s. In those systems, you needed record numbers to locate the data, so T-SQL exposed this count as IDENTITY. Also, when a record was deleted, a bit flag on each record might be set to signal a garbage collection routine to remove this data during a file utility clean up.

Obviously, IDENTITY increments its value from physical insertion to physical insertion; no big jumps, no gaps. Again, IDENTITY is not a column; it is a table property. That means it cannot be a key by definition. Remember a key is a subset of columns that are unique in every row of a table. It is not an integer; you cannot do meaningful math on it. It is non-deterministic. If you insert a set of more than one row into a table, the rows will be inserted one at time in an order determined by the indexes, internal state of the hardware and number of sessions open at the time.

We have, for a long time, created a table of integers that has all sorts of uses. It starts at 1 and goes to some sufficiently large value. Its name depends on the author of the table. You will see “Numbers”, “Sequence”, “Tally_Table” and “Series” among others. Obviously, we can no longer use “Sequence”; it is a reserved word! I used it myself, and so I’ll have to re-write my books from now on. Darn. I always thought that “Numbers” was too generic; which numbers?? I never liked “Tally”, since a tally is a current score or amount, a running total, or count. It is not a set, but a process. I settled on “Series” for my stuff since a series is a completed (infinite) set, but it is not the right word.

Now, let’s do math.

What is the Difference Between a Sequence and a Series?

A sequence is a list of numbers. The order in which the numbers are listed is important, so for instance {1, 2, 3, 4, 5, …} is one sequence, and {2, 1, 4, 3, 6, 5, …} is an entirely different sequence. This is important. It means we can talk about the i-th member of the sequence.

A series is a sum of numbers. A classic example is: 1 + 1/2 + 1/4 + 1/8 + 1/16 + … which converge to 2 (or you can loosely say it will equal 2 at infinite). The order of terms can change the limit to which the series converges or it might not (“absolutely convergent”). Okay, too much math for a database article.

Sequences and series are closely related to each other. But a sequence is not summed.

Tag Numbers versus Sequence

A tag number is string, usually of fixed length, made up of digits. The digits are in sequence to make them easy to sort and to generate. Think of a serial number on a manufactured product. The most common example is a vehicle identification number (VIN). This is a unique code is used by the automotive industry to identify individual motor vehicles,  as defined in ISO 3833. The positions 12 to 17 are a sequence prefixed by codes for the Manufacturer Identifier, some vehicle attributes, a check digit, model year, and plant code.

The check digit is a major reason we like tag numbers. This is a topic in itself, but the idea is that we can take each digit in the tag number and run it thru a formula as an integer. We get a result, usually another digit, and we attach it to the tag number. For example, the Luhn algorithm is a common method

defined by ISO/IEC 7812-1. We like it because it is simple to put in hardware.

1.      Compute the sum of the digits .

2.      Take the units digit from this total.

3.      Subtract it from 10.

It’s not a great check digit, but it catches most of the common input errors – missing digits, extra digits, wrong digits and pairwise transposes.

Now let’s into the SQL.

The CREATE SEQUENCE Statement

This statement creates a schema level object that is accessible to any user. It is not part of a table. It is not a procedure that belong to one user. The idea is that a user can invoke the SEQUENCE with a special syntax that is used wherever an integer value of the SEQUENCE data type would work.

If you want a physical model, imagine you are in the butcher store. You walk in and pull a service ticket number from a roll of tickets on the counter. Sequence numbers are generated outside the scope of the current transaction, just like the tickets. The numbers are consumed whether the transaction using the ticket number is actually served (committed) or they walk out (rolled back).

I will explain the OVER() clause later; just keep the simple butcher store model in mind for now. Let’s look at the roll of tickets (i.e. CREATE SEQUENCE). It has to be declared with parameters that define behavior.

Let’s go thru the BNF in detail. The <sequence_name> and its qualifiers explains itself. This is pure SQL. Sequence numbers are generated outside the scope of the current transaction. They are consumed whether the transaction using the sequence number is committed or rolled back. Think about the butcher shop ticket that gets dropped on the floor.

A sequence can be defined as any integer type: That means TINYINT, SMALLINT, INTEGER and BIGINT; but it also allows DECIMAL (s, 0) and NUMERIC(s, 0) data types. You can also a user-defined data type that is based on one of the allowed types. Do not do that if you value portability.

If no data type is provided, BIGINT is the default. Do not use this unless you really need  a range of -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807), which makes ‘Big Data’ seem like note on the back of the hand. As with most defaults, this is the largest, safest value in the problem space.

This is the first value returned by the sequence. The START value must be a value between the minimum and maximum values. The default start value for a new sequence is the minimum value for an ascending sequence and the maximum value for a descending sequence. Note this is a constant; no function calls.

Value used to increment (or decrement if negative) the value of the sequence for each call to the NEXT VALUE FOR. If the increment is a negative value, the sequence is descending; otherwise, it is ascending. The increment cannot be 0, obviously. The default increment for a new sequence is 1. Again, this is a constant and not a function call.

Specifies the bounds for the sequence. The default minimum value for a new sequence is the minimum value of the data type of the sequence. This is zero for the TINYINT data type and a negative number for all other data types.

NO MINVALUE is probably a really bad design choice . You probably wanted zero or one.

Specifies the bounds for the sequence. The default maximum value for a new sequence is the maximum value of the data type of the sequence.

NO MAXVALUE is probably another bad design. If you are going to use this to create a tag number, you have to worry about overflowing the length of your string.

If you are casting the numeric data type to strings for tag numbers, you will want to be sure that the numbers do not overflow in the conversion. And you probably do not need the upper limit of a data type.

Property that specifies whether the sequence should restart from the minimum value (or maximum for descending sequences) or throw an exception when its minimum or maximum value is exceeded. The default cycle option for new sequences is NO CYCLE. Note that cycling restarts from the minimum or maximum value, not from the start value.

I am not going to explain Cache management. This is an implementation detail and I do programming.

In many financial applications, you cannot afford gaps. For example in Italy, invoice numbers have to be in a strict sequence by law. But you can lose all of the numbers in cache in the event of a major failure, so while it is faster, it is also dangerous. Remember that today “ROI” means “Risk of Incarceration”

OVER (ORDER BY ..) Clause

 Remember earlier in this article when I mentioned that how IDENTITY in assigned to each element of a a set insertion was not deterministic? SEQUENCE has a way to fix this problem; you can order the sequence! If another process is accessing the sequence object at the same time, the numbers returned could have gaps.

An OVER clause applied to the NEXT VALUE FOR function does not support the PARTITION BY or the [ROW | RANGE] subclauses for obvious reasons. The following additional rules apply when using the NEXT VALUE FOR function with the OVER clause:

Multiple calls to the NEXT VALUE FOR function for the same sequence generator in a single statement must all use the same OVER clause definition. Again, this is obvious.

Multiple calls to the NEXT VALUE FOR function that reference different sequence generators in a single statement can have different OVER clause definitions. They are separate schema objects.

If all calls to the NEXT VALUE FOR function in a SELECT statement specifies the OVER clause, an ORDER BY clause may be used in the SELECT statement. Again, the SEQUENCE is a separate schema object.

The OVER clause is allowed with the NEXT VALUE FOR function when used in a SELECT statement or INSERT ... SELECT ... statement. The NEXT VALUE FOR function is not allowed in UPDATE or MERGE statements.

Programming Examples

To get a feel for how this works, create a simple sequence and play with it. 

Now, just play with it. Just hit  “SELECT NEXT VALUE FOR Invoice_Seq;” a few times, or in SSMS execute …

After you hit 99 and then invoke the next value, you will get this message.

We did not talk about this “restart” option yet. It is a straightforward ALTER statement.

 The WITH option lets you assign the new starting value. If you do not give a specific value, the default is the original starting value. In this example, we can use:

 this will return 3. Most of the situations where you want to re-start a sequence can be done with the CYCLE clause.

 The use of the ORDER BY clause can let you build groupings. Create a sequence groups of ten:

Now use it in a query that breaks the students into groups of ten

 Converting the sequence into, say, a four digit invoice serial number is pretty simple. It is important to use VARCHAR(n).

 SELECT ‘IN:’ + RIGHT (‘0000’ + CAST (NEXT VALUE FOR Invoice_Seq AS VARCHAR(5)), 4);

 But a fun example is that of using the same sequence in more than one table. Let’s go back the original service ticket example. When you come into the shop, you pull a ticket.  

We have two departments in this delicatessen, meats and fish. If you have been to  a Kosher market, you know that these two areas are kept separate. A ticket can be used in only one department.

 Now, let’ call the procedure a few times:

 And now let’s see how this works.

1

Brisket

3

Chicken

4

Brisket

2

Lox

5

Sturgeon

6

Haddock

 If I UNION the two tables, I get a complete sequence and I am sure their intersection is empty. Doing this without a SEQUENCE is a lot harder. But I did resort to if-then-else flow control because I have two different tables. UGH!

I can put a SEQUENCE in the DEFAULT clause of the DDL for table:

Now play with this code.

 That is cool! But do not stop here. Let’s re-do the Meats and Fish tables with this feature, because we can then do inserts without the need for a procedure and still be sure of an empty intersection:

Now, to get you started,  try these statements. I dropped the kosher theme 🙂

 There are no guarantees as to how the sequence numbers will be assigned; it is pretty much first-come, first-served in the system. Did you notice that the sequences are the PRIMARY KEY?

will give us:

And this is all declarative code!