In Oracle, under certain circumstances, an implicit data type conversion precludes the use of indexes. Perhaps you have a vague idea of what an implicit data type conversion hierarchy is, and you might even ignore the subtlety of the implicit data type conversion direction within a query predicate expression. Hopefully this article will shed some light on this concept.
1. Predicate Definition
The most common form of a predicate is:
1 |
<expression> <operator> <expression> |
For the sake of simplicity we are going to consider implicit conversion during SELECT statements and we’ll suppose the predicate to be of the following form:
1 |
<column> <operator> <literal or bind variable> |
Any function explicitly or implicitly applied to the <column> will prevent the index on that column from being used. This is why the <operator> should always operate on expressions of the same data type. When these two expressions are of a different data type, an implicit data type conversion occurs. In this situation the direction of the implicit conversion is of crucial importance. When this conversion is processed from right to left or, when the data type of the <literal or bind variable> is converted to that of the <column>, then there will very probably be no impact on the performance of the corresponding query. This is because such a conversion will only be done once and will also not pre-empt the index on the <column> to be used. If, instead, the implicit conversion is processed the other way around (meaning that when the <column> data type is aligned with that of the <literal value or bind variable>) then this might dramatically degrade the performance of the underlying query.
2. Data Type Conversion Hierarchy
Before we get carried away with this new terminology, let’s ask the question: how is this data type implicit conversion hierarchy decided? In Oracle (and in MSSQL as well) there is an implicit data type conversion hierarchy which dictates the direction in which Oracle has to apply the data type conversion. For the most common data types this direction can be summarized as follows: Oracle always converts string to number and string to date.
1 2 |
VARCHAR → NUMBER VARCHAR → DATE |
Simply put, this means that when a character <column> is compared to an <expression> of a number data type, then it is the <column> that will undergo an implicit data type conversion, ignoring the existing index on this column. When a number <column> is compared to an <expression> of character data type then it is the <expression> that will be converted to the column data type – probably without any significant impact.
By now you’ve hopefully realised you should:
- Always ensure that the operator is applied on columns and expressions of the same data type.
- Or, in the case of implicit data type conversion, ensure that the conversion is done from the data type of the expression to that of the column and not the other way around.
3. Oracle Data Type Implicit Conversion Example
Now it’s time to put what we’ve learned into action. Here’s how I created the table and the columns to demonstrate the above conversion hierarchy and direction:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
create table t1(n1 number, v1 varchar2(10), d1 date); insert into t1 select rownum n1 , rownum v1 , sysdate + dbms_random.value(0,365) from dual connect by level <= 1e3; create index t1_n1_idx on t1(n1); create index t1_v1_idx on t1(v1); create index t1_d1_idx on t1(d1); exec dbms_stats.gather_table_stats (user, 't1'); |
As shown below, the table t1 consists of three columns, each of a different data type:
1 2 3 4 5 6 |
SQL> desc t1 Name Null? Type ---- ----- ------------ N1 NUMBER V1 VARCHAR2(10) D1 DATE |
In the following query comparing the varchar2 v1 column to a number, the literal value is executed and its corresponding execution plan is displayed:
1 2 3 4 5 6 7 8 9 10 11 12 |
SQL> select count(1) from t1 where v1 = 1; ------------------------------------------- | Id | Operation | Name | Rows | ------------------------------------------- | 0 | SELECT STATEMENT | | | | 1 | SORT AGGREGATE | | 1 | |* 2 | TABLE ACCESS FULL| T1 | 1 | ------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(<span style="color: red;">TO_NUMBER</span>("V1")=1) |
In accordance with what we have seen in the preceding section, when the varchar2 <column> v1 is compared to a number, Oracle applies the hierarchy and direction of the implicit data type conversion and aligns the v1 datatype to that of the number literal value. This is clearly visible in the predicate part where a to_number function has been applied on the v1 column generating a full table scan instead of an index range scan.
Let’s consider the same query where the predicate part is organized the other way around:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
SQL> select count(1) from t1 where n1 = <span style="color: red;">'1'</span>; ----------------------------------------------- | Id | Operation | Name | Rows | ----------------------------------------------- | 0 | SELECT STATEMENT | | | | 1 | SORT AGGREGATE | | 1 | |* 2 | INDEX RANGE SCAN| T1_N1_IDX | 1 | ----------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("N1"=<span style="color: red;">1</span>) |
Following the same conversion strategy exposed above, this time Oracle has implicitly converted the data type of the literal value to that of the column. The column n1, being of a number data type, does not have to comply with the data type of the <literal> value. Remember the mantra: strings are always converted to numbers and not vice versa. Notice the absence of the cote surrounding the number 1 in the predicate part indicating the implicit conversion.
Although the first type of implicit data type conversion processed from the column to the literal value can cause dramatic deterioration of the query performance, the second kind of data type conversion operating from the literal value to the column has not pre-empted the index from being used and has not altered the performance of the corresponding query.
You might ask why Oracle has not managed to change the first query in the same way it did for the second one by converting the data type of the literal value to that of the column? Something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
SQL> select count(1) from t1 where v1 = <span style="color: red;">to_char(1)</span>; ----------------------------------------------- | Id | Operation | Name | Rows | ----------------------------------------------- | 0 | SELECT STATEMENT | | | | 1 | SORT AGGREGATE | | 1 | |* 2 | INDEX RANGE SCAN| T1_V1_IDX | 1 | ----------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("V1"='1') |
Indeed that is a good question, for which I have no answer to other than the above implicit data type conversion direction which is always processed from string to numbers.
Let’s now consider a date column in the predicate of a similar query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
SQL> alter session set nls_date_format='DD/MM/YYYY HH24:MI:SS'; SQL> select count(1) from t1 where d1 = '26/02/2017 12:22:46'; ----------------------------------------------- | Id | Operation | Name | Rows | ----------------------------------------------- | 0 | SELECT STATEMENT | | | | 1 | SORT AGGREGATE | | 1 | |* 2 | INDEX RANGE SCAN| T1_D1_IDX | 1 | ----------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("D1"=<span style="color: red;">TO_DATE</span>(' 2017-02-26 12:22:46', 'syyyy-mm-dd hh24:mi:ss')) |
Again, as expected, Oracle uses the implicit data type conversion direction to convert the character expression of the predicate ('26/02/2017 12:22:46'
) to the date column part (d1) of the same predicate. Since it is the data type of the expression that has been aligned with that of the column then the index has been used. Slightly changing the above query to the following one:
1 |
SQL> select count(1) from t1 where d1 = TIMESTAMP '2017-02-26 12:22:46'; |
Yields the following execution plan:
1 2 3 4 5 6 7 8 9 10 11 12 |
------------------------------------------- | Id | Operation | Name | Rows | ------------------------------------------- | 0 | SELECT STATEMENT | | | | 1 | SORT AGGREGATE | | 1 | |* 2 | TABLE ACCESS FULL| T1 | 10 | ------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(<span style="color: red;">INTERNAL_FUNCTION</span>("D1")=TIMESTAMP' 2017-02-26 12:22:46.000000000') |
Note how Oracle has applied an INTERNAL_FUNCTION
in order to implicitly convert the data type of the d1 column (date) to that of the expression (timestamp). This example has paved the way excellently to announce a third implicit data type conversion used by Oracle: dates are implicitly converted to timestamp when compared together in a query predicate:
1 |
DATE → TIMESTAMP |
In order to avoid such implicit conversion the above timestamp expression has to be converted to a date either by using a to_date function or a cast operator as shown below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
SQL> select count(1) from t1 where d1 = cast (TIMESTAMP '2017-02-26 12:22:46'as date); ----------------------------------------------- | Id | Operation | Name | Rows | ----------------------------------------------- | 0 | SELECT STATEMENT | | | | 1 | SORT AGGREGATE | | 1 | |* 2 | INDEX RANGE SCAN| T1_D1_IDX | 1 | ----------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("D1"=CAST(TIMESTAMP' 2017-02-26 12:22:46.000000000' AS date)) |
4. MS-SQL data type implicit conversion example
In the preceding sections I wrote that other relational databases are using implicit data type conversion hierarchy as well. In the below table and query I have demonstrated this conversion direction in MS-SQL Server 2016:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
create table t1 (n1 integer ,v1 char(10) ,d1 date); insert into t1 select TOP 10000 row_number() over (order by a.name) n1 ,row_number() over (order by a.name) v1 ,(getdate() +ROW_NUMBER() over (order by a.name) )d1 from sys.all_objects a cross join sys.all_objects b; create clustered index t1_v1_idx on t1(v1); |
In the next two queries I am going to select from the above t1 table comparing the v1 character column to an expression of the same data type (‘1’) in the first query and comparing the same column to an expression of a different data type in the second query (1):
1 2 |
select count(1) from t1 where v1 ='1'; select count(1) from t1 where v1 = 1; |
The execution plans of the above two queries are shown below:
Notice how in the top execution plan there is an index seek (equivalent to the Oracle index range scan) operation indicating that the implicit conversion has been done from the expression data type (char) to that of the column (number), generating a cost of 8%.
Notice as well how in the bottom execution plan there is a much higher cost of 92%, together with a Clustered Index Scan (the equivalent of Oracle full table scan) because of the conversion happening from the column data type to that of the expression. Additionally, if you hoover over the yellow symbol with your mouse a new warning pop-up window will appear indicating that a non-desired implicit data type conversion has been performed.
5. Conclusion
Understanding the hierarchy and the direction in which an implicit data type conversion is going to operate is very important in order to know whether this conversion will harm the query performance or not. While this article has not covered all the implicit data type conversion possibilities it has however highlighted few of the most plausible implicit data type conversions occurring in today’s live systems. The bottom line is that we should always ensure that the column and the expression on each side of the predicate operator are of the same data type. Or, in case of forced implicit conversion, ensure that the conversion is done from the data type of the expression to that of the column and not in the other way around.
Load comments