The VALUES clause or building tables out of nothing

The VALUES clause is probably one of the most misused features in SQL. If you look at SQL forums online, you’ll see people use it as the second clause in an insertion statement, but they only use it to construct a single row at a time, thus:

BEGIN

INSERT INTO Zodiac (astro_sign, astro_start_date, astro_end_date)

VALUES ('Aries', '2022-03-21', '2022-04-19');

INSERT INTO Zodiac (astro_sign, astro_start_date, astro_end_date)

VALUES ('Taurus', '2022-04-20', '2022-05-20');

…

INSERT INTO Zodiac (astro_sign, astro_start_date, astro_end_date)

VALUES ('Pisces', '2023-02-19', '2023-03-20');

END;

Each insertion statement ends with a semi-colon, so they will be executed separately and in the order presented. An optimizer doesn’t dare combine them because there might be a forward reference to previous insertions.

I think people write this kind of code because that this is how you would read punch cards. Each card goes into a card reader, gets buffered, and written in the order presented to the magnetic tape or disk file. Welcome to 1960! Stop mimicking old programming languages like FORTRAN or BASIC that had WRITE statements and put one record at a time into a file. Start thinking of working with entire sets.

Using VALUES to insert rows

The VALUES clause is more appropriately called a table constructor. Each row constructor within the table is a comma-separated list enclosed in parentheses. Officially, there is an optional keyword ROW. They can be placed at the start of each list. Nobody does this, and it is a bit redundant, but was required in MySQL.

One of the worst ways of constructing a table is to use the CREATE or DECLARE construct to build a temporary table, load it with insertion statements, and finally insert the table into the desired destination. This leads to multiple statements with no way to really optimize the insertion and shows that you are really not thinking in sets yet.

The entire zodiac can be inserted with a single statement like this:

INSERT INTO Zodiac (astro_sign, astro_start_date, astro_end_date)

VALUES (

('Aries', '2022-03-21', '2022-04-19'),

('Taurus', '2022-04-20', '2022-05-20'),

('Gemini', '2022-05-21', '2022-06-21'),

('Cancer', 2022-06- 22', '2022-07-22'),

('Leo', '2022-07-23', '2022-08-22'),

('Virgo', '2022-08-23', '2022-09-22'),

('Libra', '2022-09-23', '2022-10-23'),

('Scorpius', '2022-10-24', '2022-11-21'),

('Sagittarius', '2022-11-22', '2022-12-21'),

('Capricorn', '2022-12-22', '2023-01-19'),

('Aquarius', '2023-01-20', '2023-02-18'),

('Pisces', '2023-02-19', '2023-03-20')

);

Given a whole set of rows, the optimizer can deal with a single atomic statement. Not only does it save execution time as compared to the row-at-a-time model of insertion, but it presents the optimizer with an opportunity to improve things. The insertion statement can rearrange the list of new rows and pick an optimal ordering. It also means that if one of my rows had an error in it, I wouldn’t have to back out all of the other rows. If I wanted a proper ACID transaction model, I would’ve had to back out each individual insert up until I came to the insertion that gave me the error.

Here’s the basic syntax. Please note that besides including an expression of the proper data type, you can use the keywords DEFAULT or NULL in a row constructor. Obviously, those values must make sense in relation to the declaration of the table into which you are inserting.

VALUES (<row value expression list>) [ , ...n ]

<row value expression list> ::= {<row value expression> } [ , ...n ]

<row value expression> ::= { DEFAULT | NULL | <expression> }

Please remember that an expression is not always a simple constant. In fact, it’s very handy to use the CAST (<exp> AS <data type>) function as a way to assure that a column in the constructed virtual table has a known data type:

1 2	VALUES (CAST(‘foobar’, AS NVARCHAR(10), CAST(42 AS INTEGER), CAST(3.14159 AS REAL))

The AS keyword can also be used, to give the constructed tables each a name. Here is skeleton:

SELECT X.a, X.b, X.c, Y.a, Y.b, Y.c

FROM

(VALUES (1,2,3) , (4,5,6) ) AS X(a, b, c) ,

(VALUES (1,2,3), (4,5,6) ) AS Y(a,b,c)

WHERE X.a =Y.a

AND X.b = Y.b

AND X.c <= 0.0;

MERGE with VALUES Clause

The MERGE statement was added to Standard SQL several years ago. It was based on a proposal by ANSI representatives from Oracle and IBM, but forms of it had already existed in other products, though under a different name. The most common one was UPSERT from Postgres. Let’s jump right into it.

One table expression is the target, the table you are trying to modify. The other table expression is the source, the table that provides the modifications. Presumably, you want the target to persist, but you don’t need the source to persist after the updates and insertions are done.

The MERGE clause defines the target; the USING clause defines the source. And the ON clause matches the two tables. The WHEN [NOT] MATCHED ...THEN clauses determine the action to be taken

MERGE INTO Sales.Sales_Reasons AS Target

USING (VALUES ('Recommendation', 'Other'),

('Review', 'Marketing'), ('Internet', 'Promotion'))

AS Source (new_sales_name, new_reason_type)

ON Target.sales_name = Source.new_sales_name

WHEN MATCHED

THEN UPDATE SET reason_type = Source.new_reason_type

WHEN NOT MATCHED

THEN INSERT (sales_name, reason_type)

VALUES (new_sales_name, new_reason_type);

It’s easier to think of the MERGE statement as a program, written as a single statement Instead of having IF–THEN–ELSE logic or CASE expressions in multiple statements.

Obviously, updating makes sense only when there is a match, and inserting makes sense only when there is not a match. The standards allow for either of these clauses to include an optional … AND <search condition>, so you can add quite a bit of logic to this one statement. Technically, the WHEN clauses list can finish with the ELSE IGNORE; it acts as a placeholder just as the ELSE clause did in the CASE expression. Microsoft has more extensions to the syntax, and there have been some performance issues. If you are using it in SQL Server, I strongly suggest checking both the syntax and current performance in whichever version of SQL Server you’re running.

Conclusion

The VALUES clause is ANSI standard and implemented by many relational database vendors. The VALUES clause can save typing, and it’s also the rare case when easier can mean better performance because the rows will be treated as a set.

Register for Simple Talk

The VALUES clause or building tables out of nothing

Using VALUES to insert rows

MERGE with VALUES Clause

Conclusion

Article tags

About the author

Joe Celko

Joe's contributions

Articles

Books

Top topics

Joe's latest contributions:

State Transition Constraints

Data Encoding Schemes

Data Elements on a Nominal Scale

Using VALUES to insert rows

MERGE with VALUES Clause

Conclusion

Article tags

Recommended

About the author

Joe Celko

Joe's contributions

Articles

Books

Top topics

Joe's latest contributions:

State Transition Constraints

Data Encoding Schemes

Data Elements on a Nominal Scale