Looking at VIEWs, Close Up

Views in SQL can be difficult. It isn't easy to judge when to use them, It isn't always obvious how to determine if a view can be indexed or if it is updateable. Joe Celko takes a tricky topic and comes up with some helpful guidelines.

The original SQL-86 standards introduced a bit of “standard-speak”: a word that has continued to be used ever since. This word is ‘effectively’. ‘Effectively’ is used to describe the final effect of a statement. We do not specify the implementation. We specify the results. You would probably assume that this is how all standards for languages would be done, but the standards for both FORTRAN and COBOL originally specified contiguous storage and other such implementation details. As far as I know, we were the first one to get away from that model with the SQL Standards.

Views are virtual tables, defined by a SELECT statement, and it has to effectively behave as if the result of that statement was an actual physical base table. This meant that, anywhere in the grammar that allows a table, you can use the view. Views do not have space allocated for their data until they are invoked, so they are cheap in terms of resources.

Unfortunately, most SQL products show views and base tables separately in their object browsers as if they were fundamentally different. I suspect that the reason is that the text of the query has to be saved, for reasons will get into in a minute, and so it fits into a different part of the information schema. It is probably just easier to avoid rolling base tables and views together for displaying in a tool.

The Standard SQL syntax for the VIEW definition is

The <view column list> is optional; when it is not given, the VIEW will inherit the column names from the query. The number of column names in the <view column list> has to be the same as the degree of the query expression. If any two columns in the query have the same column name, you must have a <view column list> to resolve the ambiguity. The same column name cannot be specified more than once in the <view column list>.

The <levels clause> option in the WITH CHECK OPTION did not exist in SQL-89 and has no effect on queries, but only on UPDATE, INSERT INTO and DELETE FROM statements. We will get to the details of this underused feature. It has to do with nested views and how they are “unrolled” by a database event.

Implementations

There are two common ways of implementing views in SQL products. The first way is to save the text, copy it inline like a derived table or CTE, and pass the expanded query to the optimizer. If you are an old assembly language or C programmer, think of an in-line text macro.

The advantage of in-line expansion is that it is simple, and lets the optimizer work with the whole statement. But this means that the same code will be executed over and over, even when the data is constant, by each query that uses this view.

This is actually a pretty common situation. Imagine that you have views that do monthly summaries, so that each time you referenced the “Sales_Summary” view, it is shared by many end-of-the-month reports. You are doing a pretty complicated computation over and over again. What was fine for the single query is suddenly less beneficial for the application as a whole.

The second method, which follows the Standards exactly, is to execute the view’s query text, and materialize it as a temporary table. The advantage of this is that if several sessions are sharing the same view, then they can get to this materialized table and share it. But temporary tables are not always cached quite the same as base tables.

IBM calls this approach a ‘materialized query table’ (MQT) and has more syntax to control how the system stores and refreshes the data. The view can be refreshed explicitly by the user or automatically by the system. But SQL Server users have to do a little more manual programming to get the same results.

Indexing and Views

SQL Server allows you to add indexes to views.This indexing is done for performance purposes. To make an indexed view, you need to create a unique clustered index on that view. This is effectively a key. I hope you notice the word ‘effectively’. This also requires that the view must be materialized. You could construct something similar by using temporary tables to store the data, but a temporary table is static. The indexed view will need to automatically maintain the view definition when it is accessed.

After you create the unique clustered index, you can then create multiple nonclustered indexes on the view. You can check whether you can do this by using the ObjectProperty function’s ‘IsIndexed’ argument. If you then drop the indexes, the view will still behave like a standard view.

Updateable and Read-Only views

Unlike base tables, views are either updateable or read-only, but not both. INSERT, UPDATE, and DELETE operations are allowed on updateable views and base tables, subject to any other constraints. INSERT, UPDATE, and DELETE are not allowed on read-only views, but you can change their base tables, and get the new data with the next invocation of the view.

An updateable VIEW is one that can associate each of its rows with exactly one row in an underlying base table. When the view is changed, the changes must pass through the view to that underlying base table unambiguously. Updateable views in Standard SQL are defined only for queries that meet these criteria

  1. They are built on only one table,
  2. No GROUP BY clause
  3. No HAVING clause
  4. No aggregate functions
  5. No calculated columns
  6. No UNION, INTERSECT or EXCEPT
  7. No SELECT DISTINCT clause
  8. Any columns excluded from the view must be NULL-able or have a DEFAULT in the base table, so that a whole row can be constructed for insertion.

By implication, the view must also contain a key of the table. In short, the database engine must be absolutely sure that each row in the view maps back to one and only one row in the base table. Some updating is handled by the CASCADE option in the referential integrity constraints on the base tables, not by the VIEW declaration.

The definition of updatability in Standard SQL is actually pretty limited, but very safe. The database system could look at information it has in the referential integrity constraints to widen the set of allowed updateable views. The SQL standard definition of an updateable view is actually a subset of the possible updateable views, and a very small subset at that. The major advantage of this definition is that it is based on syntax and not semantics. For example, these views are logically identical:

But Foo1 is updateable and Foo2 is not. While I know of no formal proof, I suspect that determining if a complex query resolves to an updateable query for allowed sets of data values possible in the table is an NP-complete problem.

Types ofViews

Without going into details, here is a list of types of queries that can yield updateable views, as taken from “VIEW Update Is Practical”, (Goodman 1990):

  1. Projection from a single table (Standard SQL)
  2. Restriction/projection from a single table (Standard SQL)
  3. UNION views
  4. Set Difference views
  5. One-To-One Joins
  6. One-To-One Outer Joins
  7. One-To-Many Joins
  8. One-To-Many Outer Joins
  9. Many-To-Many Joins
  10. Translated and Coded columns

This does not mean that all views of these types are updateable, but some of them can be with conditions.

Single-Table Projection and Restriction

In practice, many views are projections or restrictions on a single base table. This is a common method for obtaining security control by removing rows or columns that a particular group of users is not allowed to see. These views are usually implemented as in-line macro expansion, since the optimizer can easily fold their code into the final query plan.

Calculated Columns

One common use for a view is to provide summary data across a row. For example, given a table with measurements in metric units, we can construct a view that hides the calculations to convert them into US Customary units or English (Imperial) units.

It is important to be sure that you have no problems with NULL values when constructing any calculated column. For example, given a Personnel table with columns for both salary and commission, you might construct this view:

Office workers do not get commissions, so the value of their commission column will be NULL, so we use the COALESCE() function to change the NULLs to zeros.

If the values used in the computation come from one table, life is then easy for updating the view. But the computation requires values from two or more tables in the view, then I need to have an inverse function that also distributes the values out to the proper base tables. This is not possible in all cases.

Translated Columns

Another common use of a view is to translate codes into text or other codes by doing table lookups. This is a special case of a joined view based on a foreign key relationship between two tables. For example, an order table might use a part number that we wish to display with a part name on an order entry screen. This is done with a JOIN between the order table and the inventory table, thus:

Sometimes the original code is kept and sometimes it is dropped from the view. As a general rule, it is a better idea to keep both values even though they are redundant. The redundancy can be used as a check for users, as well as a hook for nested joins in either of the codes.

The idea of JOIN views to translate codes can be expanded to show more than just one translated column. The result is often a “star” query with one table in the center, joined by FOREIGN KEY constraints to many other tables to produce a result that is more readable than the original central table.

Missing values are a problem. If there is no translation for a given code, no row appears in the view, or if an outer join was used, a null will appear. The programmer should establish a referential integrity constraint to CASCADE changes between the tables to prevent loss of data.

Grouped Views

A grouped view is based on a query with a GROUP BY clause. Since each of the groups may have more than one row in the base from which it was built, these are necessarily read-only views. Such views usually have one or more aggregate functions and they are used for reporting purposes. They are also handy for working around weaknesses in SQL. Consider a view that shows the largest sale in each state. The query is straightforward:

SQL does not require that the grouping column(s) appear in the select clause, but it is a good idea in this case.

UNION-edViews

Until recently, a view based on a UNION or UNION ALL operation was read-only because there is no way to map a change in the view to just one row in only one of the base tables. The UNION operator will remove duplicate rows from the results. Both the UNION and UNION ALL operators hide which table the rows came from. Such views must use a <view column list>, because the columns in a UNION [ALL] have no names of their own. In theory, a UNION of two disjoint tables, neither of which has duplicate rows in itself should be updateable.

Some of the major databases, such as Oracle and DB2, support inserts, updates and delete from such views in some circumstances. For example, DB2 attempts to insert any given row into each of the tables underlying the UNION ALL view. It then counts how many tables accepted the row. It has to process the entire view, one table at a time and collect the results.

  1. If exactly one table accepts the row, the insert is accepted.
  2. If no table accepts the row, a “no target” error is raised.
  3. If more than one table accepts the row, then an “ambiguous target” error is raised.

The overhead can be awful. Since SQL is set-oriented, a set might have all three conditions, so the method of resolving this is complicated.

Nested Views

A point that is often missed, even by experienced sql programmers, is that a view can be built on other views. The only restrictions are that:

  • Circular references within the query expressions of the views are illegal
  • A view must ultimately be built on base tables

One problem with nested views is that different updateable views can reference the same base table at the same time. If these views then appear in another view, it becomes hard to determine what has happened when the highest-level view is changed. As a very simple example, consider a table with two keys:

We have now tried the escargots and decided that we wish to change our opinion of them:

Our French user has just tried haggis and decided to insert a new row for his experience:

The row that is created is (NULL, 3, NULL, 'Le swill'), since there is no way for view French_Words to get to the view English_Words columns. Likewise, the English view user can construct a row to record his translation, (3, NULL, 'Haggis', NULL). But neither of them can consolidate the two rows into a meaningful piece of data.

To delete a row is also to destroy data; the French-speaker who drops ‘croissants’ from the table also drops ‘muffins’ from English_Words.

WITH CHECK OPTION Clause

If WITH CHECK OPTION is specified, the viewed table has to be updateable. This is actually a fast way to check how your particular SQL implementation handles updateable views. Try to create a version of the view in question using the WITH CHECK OPTION and see if your product will allow you to create it.

The WITH CHECK OPTION is part of the SQL-89 standard, which was extended in Standard SQL by adding an optional <levels clause>. CASCADED is implicit if an explicit LEVEL clause is not given. Consider a view defined as…

… and now UPDATE it with …

The UPDATE will take place without any trouble, but the rows that were previously seen now disappear when we use V1 again. They no longer meet the WHERE clause condition! Likewise, an INSERT INTO statement with values (col1 = 'b') would insert just fine, but its rows would never be seen again in this view. Views created this way will always have all the rows that meet the criteria and that can be handy. For example, you can set up a view of rows with a status code of ‘to be done’: Work on them, and change a status code to ‘finished’, and they will disappear from your view. The important point is that the WHERE clause condition was checked only at the time when the view was invoked.

The WITH CHECK OPTION makes the system check the WHERE clause condition upon INSERTion or UPDATE. If the new or changed row fails the test, the change is rejected and the view remains the same. Thus, the previous UPDATE statement would get an error message and you could not change certain columns in certain ways. For example, consider a view of salaries under $30,000 defined with a WITH CHECK OPTION to prevent anyone from giving a raise above that ceiling.

The WITH CHECK OPTION clause does not work like a CHECK constraint.

The WITH CHECK OPTION must be TRUE while the CHECK constraint can be either TRUE or UNKNOWN. Once more, you need to watch out for NULLs.

Standard SQL has introduced an optional <levels clause>, which can be either CASCADED or LOCAL. If no <levels clause> is given, a <levels clause> of CASCADED is implicit. The idea of a CASCADED check is that the system checks all the underlying levels that built views well as the WHERE clause condition in the view itself. If anything causes a row to disappear from the view, the UPDATE is rejected. The idea of a WITH LOCAL check option is that only the local WHERE clause is checked. Consider two views built on each other from the salary table:

If neither view has a WITH CHECK OPTION, the effect of updating Mediumpay by increasing every salary by $1,000 will be passed without any check to Lowpay. Lowpay will pass the changes to the underlying Personnel table. The next time Mediumpay is used, Lowpay will be rebuilt in its own right and Mediumpay rebuilt from it, and all the employees will disappear from Mediumpay.

If only Mediumpay has a WITH CASCADED CHECK OPTION on it, the UPDATE will fail. Mediumpay has no problem with such a large salary, but it would cause a row in Lowpay to disappear, so Mediumpay will reject it. However, if only Mediumpay has a WITH LOCAL CHECK OPTION on it, the UPDATE will succeed.Mediumpay has no problem with such a large salary, so it passes the change along to Lowpay. Lowpay, in turn, passes the change to the Personnel table and the UPDATE occurs. If both views have a WITH CASCADED CHECK OPTION, the effect is a set of conditions, all of which have to be met. The Personnel table can accept UPDATEs or INSERTs only where the salary is between $100 and $250.

This can become very complex. Consider an example from an ANSI X3H2 paper by Nelson Mattos of IBM. Let us build a five-layer set of views, using xx and yy as place holders for CASCADED or LOCAL, on a base table T1 with columns c1, c2, c3, c4, and c5, all set to a value of 10, thus:

When we set each one of the columns to zero, we get different results, which can be shown in this chart, where ‘S’ means success and ‘F’ means failure:

xx/yy c1 c2 c3 c4 c5
CASCADED/CASCADED F F F F S
LOCAL/CASCADED F F F F S
LOCAL/LOCAL S F S F S
CASCADED/LOCAL F F S F S

To understand the chart, look at the last line. If xx = CASCADE and yy = LOCAL, updating column c1 to zero via V5 will fail, whereas updating c5 will succeed. Remember that a successful UPDATE means the row(s) disappear from V5.

Follow the action for UPDATE V5 SET c1 = 0; VIEW V5 has no with check options, so the changed rows are immediately sent to V4 without any testing. VIEW V4 does have a WITH LOCAL CHECK OPTION, but column c1 is not involved, so V4 passes the rows to V3. VIEW V3 has no with check options, so the changed rows are immediately sent to V2. VIEW V2 does have a WITH CASCADED CHECK OPTION, so V2 passes the rows to V1 and awaits results. VIEW V1 is built on the original base table and has the condition c1 > 5, which is violated by this UPDATE. VIEW V1 then rejects the UPDATE to the base table, so the rows remain in V5 when it is rebuilt. Now the action for

VIEW V5 has no with check options, so the changed rows are immediately sent to V4, as before. VIEW V4 does have a WITH LOCAL CHECK OPTION, but column c3 is not involved, so V4 passes the rows to V3 without awaiting the results. VIEW V3 is involved with column c3 and has no with check options, so the rows can be changed and passed down to V2 and V1, where they UPDATE the base table. The rows are not seen again when V5 is invoked, because they will fail to get past VIEW V3. The real problem comes with UPDATE statements that change more than one column at a time. For example, …

…will fail for all possible combinations of levels clauses in the example schema. I strongly advise againstthis kind of coding. I can be very powerful, but too hard to maintain and understand.

WITH CHECK OPTION as CHECK() clause

You can use a view that has a WITH CHECK OPTION to enforce subquery constraints. These are constraints that use a schema or table level condition instead of just the row level constraints in CHECK() clauses In full ANSI SQL you can use the CREATE ASSERTION statement to do this, but SQL Server does not have that feature yet. Let’s fake it!

For example, consider a hotel registry that needs to have a rule that you cannot add a guest to a room that another is or will be occupying. Instead of writing the constraint directly, like this:

The schedule_right constraint is fine, since it has no subquery, but many products will choke on the no_double_booking constraint. Leaving the no_double_booking constraint off the table, we can construct a view on all the rows and columns of the Hotel base table and add a WHERE clause which will be enforced by the WITH CHECK OPTION.

For example,

will give a WITH CHECK OPTION clause violation on the second INSERT INTO statement, as we wanted.

Dropping Views

Views, like tables, can be dropped from the schema. The Standard SQL syntax for the statement is:

Vendors had different DROP behaviors in their SQL implementation. When you dropped a view, the engine usually removed the appropriate row from the schema tables. You found out about dependencies when you tried to use views built on other views that no longer existed, or whose name was re-used. This is not a good design, but it I what we had.

Likewise, dropping a base table could cause the same problem when the view was accessed. You did not even get a warning! The cascade option will find all other views that use the dropped view and remove them also. If RESTRICT is specified, the view cannot be dropped if there is anything that is dependent on it.

Using Views

The best way to approach views is to think of how a user wants to see the database and then give him a set of views that make it look as if the database had been designed just for his applications.There are two parts of this design principle. First, use views to hide complexity and secondly, to enforce consistency. One way to figure out what views you should have is to inspect the existing queries for repeated subqueries, CTEs, derived tables or expressions. These are good candidates for views.

Complexity and consistency are related. It is very hard to consistently reproduce a complex criteria. For example, it. We have an elaborate business rule about who can do what, we do not want to keep writing it over and over and over. It would be very much nicer to have a view of “employees who have level XYZ certification” that we can use, without this having to be computed over and over and over from a dozen tables.

One of the major uses of views is for security. The DBA can choose to hide certain columns from certain classes of users through a combination of security authorizations (that is the DCL or Data Control Language) and views.

However, while views can be used to mask columns, apply obscuring functions, or even change data, remember that the DCL (Data Control Language) also has a place in security. My personal opinion is that we do not spend enough time teaching DCL, but that is another topic.