I have my Oracle Support account configured to send me a daily email of “hot topics”, and I’ve set it up so that the email lists the 100 bugs that have been updated most recently by Oracle support. One of the bugs that came up on 9th Jan was described as: “Bug 18700681 : DROP COLUMNS USING ALTER TABLE VERY SLOW”. My first thought when I saw this was that it wasn’t a bug it was expected behaviour (with the caveat that “very slow” might actually mean “even slower than the ‘very slow’ that I was expecting”)
I was a little curious so I logged on to MoS to see what the bug note said and found (to my complete lack of surprise) that the status was set to “32 – Not a Bug. To Filer”; then I noticed that it did, nevertheless, raise a couple of interesting little issues.
When you issue a “drop column” command Oracle has to read every block of the table, then rewrite it after removing the target column from every row; and the table in question was roughly 2TB in size (240,000,000 blocks at 8KB) with a few partitions – and the drop command had been running for 8 hours.
- Is this an unreasonable amount of time?
- If I tell you this was on an Exadata X3-2 does that change your answer to (1)?
- Have you considered what might happen if the table is compressed with OLTP compression?
- What about Hybrid Columnar Compression?
As a little extra aid to trouble-shooting it’s worth noting that most of the waits reported from a 10046 trace were “cell multiblock physical read” and “resmgr:cpu quantum” – does that surprise you, does it give you a better idea of what’s going on in this case?
I can’t give you the answers to all the questions we could ask about this example, and I don’t know whether that “8 hours and running” was a natural consequence of how this table was constructed and how Resource Manager had been configured; but I will tell you something about dropping columns and how to choose and test your strategy if you ever need to do it on a production system.
When I started to write this note I did a search on my blog for “drop column”, and found that I had a draft on the topic which I had started in 2007 and updated in 2012; and in that draft I referred back to two previous answers I had given on the topic, one to a posting on the OTN database forum from 2011 and the other to a question on the Oracle-L list server in 2002. Both questions were about the volume of undo and redo generated by the drop column command. Conveniently my draft pointed me back to a script I had written (for Oracle 8i, to check something that I had written in my first book “Practical Oracle 8i”) to demonstrate the feature – and that script was still relevant to Oracle 12c.
To “remove” a column from a table you can either mark it as “unused”, or you can drop it. Both these options are irreversible (short of recovering an older version of the database), but the former simply updates the data dictionary quickly and cheaply, the latter actually removes all the column data from the entire table as well. Technically, of course, you could achieve a similar effect to “unused” by setting the column to “invisible” in the latest versions of Oracle – and this would also be a reversible change if you later realized it was a mistake.
Examples of syntax:
alter table t set unused column c; alter table t set unused (c); alter table t set unused (c1, c2); alter table t drop column c; alter table t drop (c); alter table t drop (c1, c2) checkpoint 2000; alter table t drop unused columns; alter table t drop unused columns checkpoint; alter table t drop unused columns checkpoint 5000; alter table t drop columns continue checkpoint; alter table t drop columns continue checkpoint 5000;
For the syntax diagram and further details of options here’s the precise URL from the 22.214.171.124 manual.
There are some restrictions on dropping or marking columns as unused – for example you can’t drop a column that’s part of the partition key of a partitioned table (for obvious reasons), similarly you can’t drop a column that’s part of the primary key of an index organized table, and there are other restrictions relating to constraints and virtual columns.
On the other hand there are cases where dropping a column is allowed to have large side effects – for example I can drop all the columns in the primary key of a table, at which point the primary key index will be dropped at the same time. Of course I have a little problem if there’s a table with a foreign key referencing that primary key – but then I just have to remember that there’s a “cascade constraints” option – consider this examples which starts with a table called parent with a primary key of (id) and a table called child with a primary key of (id_p, id) where child.id_p references parent.id as a foreign key:
SQL> alter table parent drop column id; alter table parent drop column id * ERROR at line 1: ORA-12992: cannot drop parent key column SQL> alter table parent drop column id cascade constraints; Table altered. SQL> alter table child drop (id_p); alter table child drop (id_p) * ERROR at line 1: ORA-12991: column is referenced in a multi-column constraint SQL> alter table child drop (id_p, id); Table altered.
I couldn’t drop the parent primary key until I cascaded the constaint then, as Oracle cascades the drop on the parent the parent index and the child foreign key constraint disappear. I then had a problem dropping a single column from the child table because it was part of the child’s primary key, but when I dropped both primary key columns at once I had no problem.
When you drop a column, Oracle reads every row of every block in the table, and rewrites that row after removing the dropped column from it. If you drop multiple columns at the same time Oracle treats each column in the row as a separate update, and each change generates undo and redo. Note that if you mark a column as unused there is no such work – but if you subsequently drop a second column Oracle silently does the work for both columns.
The table is locked exclusively while the drop is executing and the entire drop is executed as a single massive transaction unless you include a “checkpoint” clause. In the syntax examples above we saw two versions of the checkpoint clause, one with and one without a number – if you omit the number the default value is 512 which tells Oracle to commit after every 512 rows modified, with the unusual feature that the commit does NOT release the table lock.
The question on OTN all those years ago was this:
In my 10.2.0.3 I’ve got table size 90 GB which suffers from :
ORA-12986: columns in partially dropped state. Submit ALTER TABLE DROP COLUMNS CONTINUE
I need to know how to estimate redo size of the:
alter table DROP COLUMNS CONTINUE checkpoint 10000 .
Is 2x table size good estimate?
And my reply was this:
“As a very rough guide for simple columns, the rollback generated for each row will be about 100 bytes plus the average column size being dropped, the redo will be around 250 bytes plus the average column size – even when the table is a NOLOGGING table.”
This introduces a new idea – the “continue” option. If you’ve used the checkpoint syntax with your drop command then you could kill the session (or shutdown the database, perhaps) halfway through the drop. The table will not be available for DML or query , but it is in a safe state and you can finish off the “drop columns” command using the “continue” (with an optional checkpoint) option. (Strangely there doesn’t seem to be a “suspend drop” command, so it looks like the only reason for using the “continue” option would be to handle a crash of some sort – whether accidental or deliberate.)
I gave some figures in my OTN answer, but what I failed to say at the time was that they represented the approximate workload if you are dropping just ONE column. How did I get the figures, and what happens if there are more columns? Let’s just do the experiment – first creating a simple test table:
create table t1 as with generator as ( select --+ materialize rownum id from dual connect by level <= 1e4 ) select rownum id, 'XX' v2, lpad('x',30) v30, rpad('x',100) padding from generator v1, generator v2 where rownum <= 1e6 ;
I’m going to be using 126.96.36.199 for the demonstration so I don’t need to gather stats at this point. Here’s the basic code I’m going to use to do some testing:
alter table t1 set unused column v2; alter table t1 set unused column v30; execute snap_enqueues.start_snap execute snap_rollstats.start_snap execute snap_my_stats.start_snap alter table t1 drop unused columns checkpoint 1000 ; execute snap_my_stats.end_snap execute snap_rollstats.end_snap execute snap_enqueues.start_snap
The calls to the “snap” packages are simple procedures I use to take snapshot of some of the dynamic performance views and report the differences resulting from the test. In this case I’m choosing to look at v$enqueue_stat (enqueue/lock statistics), v$rollstat (basic rollback/undo segment statistics), and a join between v$statname and v$mystat (my session statistics). I won’t report all the statistics, I’ll just tell you a little about some of the figures. The code allows for several variations – with or without the checkpoint 1000; marking just v2 unused, marking just v30 unused, marking both unused.
The really significant results of the test are the following (from the session stats):
188.8.131.52 - dropping v2 with no checkpoint ========================================= Name Value ---- ----- db block changes 2,010,425 redo entries 1,000,231 redo size 236,735,648 undo change vector size 79,845,720 table scan rows gotten 1,008,525 HSC Heap Segment Block Changes 1,000,011 184.108.40.206 - dropping v30 with no checkpoint ========================================== Name Value ---- ----- db block changes 2,034,190 redo entries 1,020,470 redo size 266,507,104 undo change vector size 107,840,960 table scan rows gotten 1,007,385 HSC Heap Segment Block Changes 1,000,008 220.127.116.11 - dropping both with no checkpoint =========================================== Name Value ---- ----- db block changes 4,048,432 redo entries 2,023,290 redo size 503,565,712 undo change vector size 187,929,544 table scan rows gotten 1,008,297 HSC Heap Segment Block Changes 2,000,011
The most obvious point we can see is that the block changes, undo and redo for dropping the two columns at the same time is the sum of the individual drops; and this isn’t because we do two table scans (one for each column); we do one table scan of 1M rows, but we handle the changes for the first column then handle the changes for the second column separately – so dropping two columns gives us two sets of undo and redo overhead per row.
The second point we can pick up is the size (approximately) of the overhead per row per column. The actual data sizes for the columns in the two individual tests are 2 bytes and 30 bytes respectively, and with 1 million rows affected we see that the undo change vector size changes from 79.8MiB to 107.8MiB, that’s a change of 28MiB – i.e. 1 million x 28 bytes. So the undo overhead is (approximately) 77,800,000 which means about 78 bytes per column per row (which may vary slightly with version of Oracle).
Similarly the figures for the redo size are 236.7Mib, 266.5 MiB, 503.6Mib: the increment from v2 to v30 is about 30 bytes per row, so if we subtract a couple of extra bytes per row from the figures for v2 we can infer that the redo overhead is about 234 bytes per column per row – so drop 4 columns from a table and your redo is going to be in the order of 1KB per row! The “bug” that kicked this article off told us about the number of blocks in the table that would have to be read and re-written – but it may be the number of rows and dropped columns that is even more important.
The figures don’t change very much for the test with the checkpoint – the most important change is that instead of one undo segment getting all the undo writes, Oracle rotates round the available undo segments, switching segment after every 1,000 rows of data, so we don’t see one segments stealing all the extents from the rest of the segments and don’t run the risk of the process crashing due to a full undo tablespace. It’s possible, of course, that this checkpoint strategy could result in other long-running sessions getting an ORA-01555 error, “snapshot too old” if the drop manages to overwrite the undo that other sessions need.
With the checkpoint in place the enqueues snapshot shows us roughly 1,000 TX enqueues taken and released – but we have to set event 10704 to see that the TM enqueue on the underlying table is held in mode 6 (exclusive) for the duration.
You can’t drop a column in a table that you’ve created with basic compression – but you can if the table is “compress for OLTP” (or “row store compress advanced” as the newer syntax puts it). Interestingly when I tested this by dropping column v2 in a compressed version of my demonstration table the redo size was about 2KB – clearly Oracle doesn’t really drop the column, it’s just a data dictionary adjustment. Perhaps the same is true for Hybrid Columnar Compression on Exadata – it seems reasonable to assume that it would be.
This leads on to a new topic – and one which I will be writing about in the near future. The manuals make the following comments about dropping columns:
- 11g manual: “This provides a convenient means to free space in a database …”
- 12c manual: “… lets you free space in the database by dropping columns ..”
Neither manual is correct (particularly with respect to the word “convenient”). When you drop columns from a table you don’t restore space to “the database” you may not even be able to re-use the space for new rows in the same table if the column you’ve dropped was quite small.
If you want to reclaim space in the tablespace, or even use the space that has become available in the table safely you’re going to have to take further steps after dropping the column(s). This may mean you decide to stick with marking columns unused until you have an opportunity to rebuild the table properly.
Next time: I’ll be looking at this problem and the related problem of “massive deletes”.
Footnote: My series on the Cost Based Optimizer will be continued in the near future.