{"id":895,"date":"2010-05-26T00:00:00","date_gmt":"2010-05-26T00:00:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/oracle-to-sql-server-crossing-the-great-divide-part-2\/"},"modified":"2021-08-16T15:02:16","modified_gmt":"2021-08-16T15:02:16","slug":"oracle-to-sql-server-crossing-the-great-divide-part-2","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/learn\/oracle-to-sql-server-crossing-the-great-divide-part-2\/","title":{"rendered":"Oracle to SQL Server: Crossing the Great Divide, Part 2"},"content":{"rendered":"<div id=\"pretty\">\n<h1>Data Generation Strategies<\/h1>\n<p>Good, scalable, database performance is largely about putting the data in the right place, being able to access it efficiently, and avoiding unnecessary overheads. At the end of the <a href=\"http:\/\/www.simple-talk.com\/sql\/learn-sql-server\/oracle-to-sql-server-crossing-the-great-divide,-part-1\/\">previous article<\/a>, I had given myself the task of building a large data set quickly so that I could start examining some of the strategies for data placement in SQL Server 2008 to identify their strengths and weaknesses.<\/p>\n<p>Initially all I wanted to do was emulate the action of an SQL statement I often use in Oracle to generate a few million rows of data, and this article is the story of my wanderings through the SQL Server landscape to find a reasonable, though not necessarily perfect, way of generating the type of data I needed.<\/p>\n<h1>A Million Row Table: the Oracle way<\/h1>\n<p>Although this was discussed in some detail in <a href=\"http:\/\/www.simple-talk.com\/sql\/learn-sql-server\/oracle-to-sql-server-crossing-the-great-divide,-part-1\/\">Part 1<\/a>, Listing 1 demonstrates, in Oracle code, the sort of data generation code I was aiming for.<\/p>\n<pre>execute dbms_random.seed(0)\r\n\u00a0\r\ncreate table t1\r\nas\r\nwith generator as (\r\n\u00a0\u00a0\u00a0 select\u00a0\u00a0\u00a0\u00a0\u00a0 rownum\u00a0\u00a0\u00a0\u00a0\u00a0 id\r\n\u00a0\u00a0\u00a0 from\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 dual\r\n\u00a0\u00a0\u00a0 connect by\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 rownum &lt;= 1000\r\n)\r\nselect\r\n\u00a0\u00a0\u00a0 rownum\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0 id,\r\n\u00a0\u00a0\u00a0 trunc((rownum-1)\/50)\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 clustered,\r\n\u00a0\u00a0\u00a0 mod(rownum,20000)\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 scattered,\r\n\u00a0\u00a0\u00a0 trunc(dbms_random.value(0,20000))\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 randomized,\r\n\u00a0\u00a0\u00a0 trunc(sysdate) + dbms_random.value(-180, 180)\u00a0\u00a0 random_date,\r\n\u00a0\u00a0\u00a0 dbms_random.string('l',6)\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 random_string,\r\n\u00a0\u00a0\u00a0 lpad(rownum,10,0)\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 vc_small,\r\n\u00a0\u00a0\u00a0 rpad('x',100,'x')\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 vc_padding\r\nfrom\r\n\u00a0\u00a0\u00a0 generator\u00a0\u00a0 g1,\r\n\u00a0\u00a0\u00a0 generator\u00a0\u00a0 g2\r\nwhere\r\n\u00a0\u00a0\u00a0 rownum &lt;= 1000000\r\n;\r\n\u00a0\r\nalter table t1 add constraint t1_pk primary key(id);\r\ncreate index t1_clu on t1(clustered);\r\ncreate index t1_sca on t1(scattered);\r\ncreate index t1_ran on t1(randomized);\r\ncreate index t1_dat on t1(random_date);\r\ncreate index t1_str on t1(random_string);\r\n<\/pre>\n<p class=\"caption\">Listing 1: Generating a million row table in Oracle<\/p>\n<p>This example is going to give me 1,000,000 rows of data in a simple heap table, with six B-tree indexes, one of which is built implicitly by the addition of the <span class=\"STCodeinTextChar\">primary<\/span> <span class=\"STCodeinTextChar\">key<\/span> constraint. I have a primary key that has been generated by the pseudo-column <span class=\"STCodeinTextChar\">rownum<\/span>, a column of randomly generated strings that will be upper case (&#8216;U&#8217;) and six characters in length, a column of randomly generated dates (with time to the nearest second) covering 360 days and centered on &#8220;today&#8221;, and three numeric columns that are going to hold 20,000 distinct integer values with 50 rows of each value (although that&#8217;s an approximation for the column called &#8220;randomized&#8221;). I needed numeric columns with different patterns in their data distribution because data location has a big impact on the work needed to access the data. I&#8217;ve also included a couple of simple character columns, one small, one larger, to make the row lengths look as if they might represent some realistic data and to give me something that isn&#8217;t indexed that I can select or update.<\/p>\n<p>Running this code on a small 2.0 GHz laptop, it takes about one minute to create the table, and about eight seconds for each index (mostly time spent re-reading table blocks). If I remove all the columns that make use of the use <span class=\"STCodeinTextChar\">dbms_random<\/span> package the time to create the table drops to about 7 seconds, so in practice I make sure that I don&#8217;t generate any columns or indexes that I don&#8217;t really need, for any given test.<\/p>\n<p>Finally, it&#8217;s worth noting that Oracle, unlike SQL Server, has to be explicitly instructed to collect statistics about a table after it has been created, although it does collect statistics on the indexes as you create them. As such, I&#8217;ve missed out a step in my code that would be critical to Oracle&#8217;s Cost Based Optimizer.<\/p>\n<p>In <a href=\"http:\/\/www.simple-talk.com\/sql\/learn-sql-server\/oracle-to-sql-server-crossing-the-great-divide,-part-1\/\">part 1<\/a> I outlined most of the problems with translating this code to SQL Server so I won&#8217;t cover them again here. Many of them were little variations in syntax, function names and features and were relatively easy to overcome by diligent searching of the manuals. However, three of the issues were especially relevant to the strategies I attempted:<\/p>\n<ul>\n<li><b>SQL Server doesn&#8217;t have an equivalent of the <span class=\"STCodeinTextBold\">dbms_random<\/span><span class=\"STBold\"> procedure, to generate random data<\/span><br \/>\n<\/b>I&#8217;ve yet to overcome this one; essentially I&#8217;ll have to find one or write my own. However, I decided to worry about that later when I really need some randomized data. In this article, I&#8217;ll just work on ideas for generating the scattered and clustered data patterns. (Update: since writing the article I&#8217;ve discovered that using the <span class=\"STCodeinTextChar\">newid()<\/span> function is a popular way to generate random data).<\/li>\n<li><b>SQL Server doesn&#8217;t recognize the <span class=\"STCodeinTextBold\">create as select&#8230;<\/span><span class=\"STBold\"> syntax<\/span><br \/>\n<\/b>I have to do &#8220;create table&#8221;, &#8220;insert as select &#8230;&#8221; instead, which is a minor detail and easy to address. In addition, in the discussion on the previous article a couple of readers pointed out the option for a simple &#8220;<span class=\"STCodeinTextChar\">select&#8230;into&#8230;from<\/span>&#8221; to create and populate a table. This mechanism copies <span class=\"STCodeinTextChar\">not null<\/span> constraints but doesn&#8217;t handle other constraints, and doesn&#8217;t allow you to define a key as you create the table. However, that&#8217;s okay with me because at present I&#8217;m not too worried about looking closely at indexes.<\/li>\n<li><b>SQL Server doesn&#8217;t have a <span class=\"STCodeinTextBold\">connect<\/span> <span class=\"STCodeinTextBold\">by clause<br \/>\n<\/span><\/b>I had to find a convenient, alternative way of generating a rowsource to play with. As you can\u00a0 see, my strategy with Oracle is to use common table expression (subquery factoring in Oracle speak) to generate a fairly small volume of data, then introduce a Cartesian join (cross product) to expand that to a much larger volume; I&#8217;ll explain why later.<\/li>\n<\/ul>\n<p>The following sections describe the various approaches that I attempted. Unless stated otherwise, the code is run using SQLCMD.<\/p>\n<h1>Strategy 1: Row-by-Row<\/h1>\n<p>I started with a strategy that I knew was probably a very bad idea (it certainly would be in Oracle): create an empty table and use a program loop to insert one row at a time. This is an easy way to get the counter that I need for my <span class=\"STCodeinTextChar\">mod()<\/span> and <span class=\"STCodeinTextChar\">trunc()<\/span> games.<\/p>\n<p>I got a little surprise here in that the only loop construct in T-SQL seems to be <span class=\"STCodeinTextChar\">WHILE&#8230;END<\/span>. I also had a little trouble finding an assignment operator; initially, it seemed as if the only option was to <span class=\"STCodeinTextChar\">SELECT<\/span> into variables, but eventually I found the <span class=\"STCodeinTextChar\">SET<\/span> operator. Running a few timing tests, the two options seem to perform identically, so maybe they&#8217;re effectively the same piece of code behind the scenes, anyway.<\/p>\n<p>Stripped to the minimum, Listing 2 shows what I came up with.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">CREATE TABLE big_table\r\n\u00a0\u00a0\u00a0 (\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 id INT ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 clustered_data INT ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 scattered_data INT ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 vc_small VARCHAR(10) ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 vc_padding VARCHAR(100)\r\n\u00a0\u00a0\u00a0 ) ;\r\ngo\r\n\u00a0\r\nDECLARE @div INT = 50\r\nDECLARE @mod INT = 200\r\nDECLARE @loop INT = 1\r\nDECLARE @limit INT = @div * @mod\r\n\u00a0\r\nBEGIN\r\n\u00a0\u00a0\u00a0 WHILE @loop &lt;= @limit \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 BEGIN\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 INSERT\u00a0 INTO big_table\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ( id ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 clustered_data ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 scattered_data ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 vc_small ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 vc_padding\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 VALUES\u00a0 ( @loop ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FLOOR(( @loop - 1 ) \/ @div) , -- trunc()\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ( @loop - 1 ) % @mod ,\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 -- mod()\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 STR(@loop, 10, 0) ,\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 -- to_char()\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 REPLICATE('x', 100)\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 -- rpad()\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 )\r\n\u00a0\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SET @loop = @loop + 1\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 END\r\nEND\r\nGo\r\n<\/pre>\n<p class=\"caption\">Listing 2: The row-by-row approach<\/p>\n<p>The functions I&#8217;ve used are the closest equivalents to the Oracle functions that I&#8217;ve referenced in the comments, but they&#8217;re not exactly identical. I&#8217;ve set up a few variables, rather than fixing constants into the script; after all, I&#8217;m hoping to create a template that I can tweak easily to generate different volumes and patterns.<\/p>\n<p>In this case, I restricted myself to 50 distinct values, 200 rows of each, for a total of just 10,000 rows. Performance was, as expected, dire: the code took 19.5 seconds to complete. Regardless of which RDBMS you&#8217;re using, doing a big job in lots of little pieces is almost certain to be a very bad idea.<\/p>\n<p>Interestingly, my code above is suffering from the overheads of handling some messaging for every single insert in the loop. In my reading, I stumbled across the <span class=\"STCodeinTextChar\">NOCOUNT<\/span> option, and by adding the <span class=\"STCodeinTextChar\">SET<\/span> <span class=\"STCodeinTextChar\">NOCOUNT<\/span> <span class=\"STCodeinTextChar\">ON<\/span> command to the start of Listing 2, I reduced the run time to 5.5 seconds. This is still slow, of course, but gives an interesting insight into how many little details you need to learn before you can criticize a product!<\/p>\n<div class=\"note\">\n<p class=\"note\"><b>A note on <span class=\"STCodeinTextChar\">NOCOUNT<\/span> option: SQLCMD vs. SSMS<\/b><br \/>\nWhen running my code from SSMS, rather than SQLCMD, the run time was still 5.5 seconds regardless of whether or not I set the <span class=\"STCodeinTextChar\">NOCOUNT<\/span> option, and I did check that the &#8220;one rows processed&#8221; messages still appeared in the message window when <span class=\"STCodeinTextChar\">NOCOUNT<\/span> was off. So SSMS must be using a different method to SQLCMD?<\/p>\n<\/div>\n<h1>Strategy 2: Sys.objects, a CTE and row_number<\/h1>\n<p>The <span class=\"STCodeinTextChar\">sys.all_objects<\/span> view is always going to hold a reasonable number of rows, so why not use that, doing a cross join to itself, to generate a lot of data? I could put the driving query into a common table expression (CTE) before doing the cross join, to make the code look a little simpler. I need a sequential number, so I can play around with the <span class=\"STCodeinTextChar\">row_number()<\/span> analytic function inside the CTE and then use a little arithmetic in the cross join.<\/p>\n<p>Listing 3 puts this approach into practice. It seems to work quite well and generated 1,000,000 rows in about 13.5 seconds, running at close to 100% CPU. The definition of <span class=\"STCodeinTextChar\">big_table<\/span> is the same as it was in Listing 2.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">DECLARE @div INT = 50 ;\r\nDECLARE @mod INT = 20000 ;\r\nDECLARE @limit INT = @div * @mod ;\r\nDECLARE @driver INT = 1000 ;\r\n\u00a0\r\nWITH\u00a0\u00a0\u00a0 generator\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AS ( SELECT TOP ( @driver )\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 row_number() OVER ( ORDER BY schema_id ) id\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM\u00a0\u00a0\u00a0\u00a0 sys.all_objects\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 )\r\n\u00a0\u00a0\u00a0 INSERT\u00a0 INTO big_table\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SELECT TOP ( @limit )\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 @driver * ( g1.id - 1 ) + g2.id ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FLOOR(( @driver * ( g1.id - 1 ) + g2.id - 1 ) \/ @div) ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ( @driver * ( g1.id - 1 ) + g2.id - 1 ) % @mod ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 STR(@driver * ( g1.id - 1 ) + g2.id, 10, 0) ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 REPLICATE('x', 100)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM\u00a0\u00a0\u00a0 generator g1\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 CROSS JOIN generator g2\r\n\u00a0\u00a0\u00a0 OPTION\u00a0 ( FORCE ORDER ) ;\r\n<\/pre>\n<p class=\"caption\">Listing 3: Cross joining <span class=\"STCodeinTextChar\">sys.all_objects<\/span> into a CTE<\/p>\n<p>There are a couple of things I don&#8217;t like about this code, though. First, it depends on a <span class=\"STCodeinTextChar\">SYS<\/span> object and it&#8217;s possible that I&#8217;m able to see that at present purely because I&#8217;m operating as a privileged user. There&#8217;s also the potential problem that the number of rows in the view may be smaller than I need for bigger tests; in my current test database it&#8217;s less than 2,000 rows.<\/p>\n<p>Another thing I&#8217;m not keen on is putting in that <span class=\"STCodeinTextChar\">FORCE<\/span> <span class=\"STCodeinTextChar\">ORDER<\/span> hint, which I&#8217;ve included to make sure that the optimizer joined the two copies of generator in the right order to allow my arithmetic to produce the expected results). I don&#8217;t really know how the optimizer is going to deal with the CTE; the execution plan happens to work out the way I want at present but that might just be luck, and could change if I ask for fewer or more rows.<\/p>\n<p>For a little extra clarity I like the idea of removing the repetition of the messy bit of arithmetic, involving the two ids, by putting in an inline view. I&#8217;ll demonstrate that in the next example.<\/p>\n<div class=\"note\">\n<p class=\"note\"><b>A note on &#8220;insert&#8230;with&#8230;select &#8230;&#8221; instead of &#8220;with&#8230;insert&#8230;select&#8230;&#8221;<\/b><br \/>\nMy first attempt at using a CTE to create some data ended in failure(&#8220;<span class=\"STCodeinTextChar\">Msg 102,<\/span> <span class=\"STCodeinTextChar\">Level 15,<\/span> <span class=\"STCodeinTextChar\">State<\/span> <span class=\"STCodeinTextChar\">1,<\/span> <span class=\"STCodeinTextChar\">Server<\/span> <span class=\"STCodeinTextChar\">HPBASE,<\/span> <span class=\"STCodeinTextChar\">Line<\/span> <span class=\"STCodeinTextChar\">2<\/span>, <span class=\"STCodeinTextChar\">Incorrect<\/span> <span class=\"STCodeinTextChar\">syntax<\/span> <span class=\"STCodeinTextChar\">near<\/span> <span class=\"STCodeinTextChar\">&#8216;generator&#8217;<\/span>&#8220;. I had used the Oracle pattern, &#8220;insert&#8230;with&#8230;select &#8230;&#8221;, where the CTE is part of the <span class=\"STCodeinTextChar\">select<\/span> statement, rather than &#8220;with&#8230;insert&#8230;select&#8230;&#8221;.<\/p>\n<\/div>\n<h1>Strategy 3: A Recursive CTE<\/h1>\n<p>The CTE looks like a good starting point for generating a lot of data, but I&#8217;d like to avoid dependencies on anything that may be outside the permission set of the current user.<\/p>\n<p>So how about a recursive CTE (a feature that didn&#8217;t appear in Oracle until 11g) that doesn&#8217;t query any objects at all?<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">DECLARE @div INT = 50 ;\r\nDECLARE @mod INT = 20000 ;\r\nDECLARE @limit INT = @div * @mod ;\r\nDECLARE @driver INT = 1000 ;\r\n\u00a0\r\nWITH\u00a0\u00a0\u00a0 generator\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AS ( SELECT\u00a0\u00a0 1 AS id\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 UNION ALL\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SELECT\u00a0\u00a0 id + 1\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM\u00a0\u00a0\u00a0\u00a0 generator\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHERE\u00a0\u00a0\u00a0 id &lt; @driver\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 )\r\n\u00a0\u00a0\u00a0 INSERT\u00a0 INTO big_table\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SELECT\u00a0 id ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FLOOR(( id - 1 ) \/ @div) ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ( id - 1 ) % @mod ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 STR(id, 10, 0) ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 REPLICATE('x', 100)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM\u00a0\u00a0\u00a0 ( SELECT TOP ( @limit )\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 @driver * ( g1.id - 1 ) + g2.id id\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM\u00a0\u00a0\u00a0\u00a0\u00a0 generator g1\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 CROSS JOIN generator g2\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ) iv\r\n\u00a0\u00a0\u00a0 OPTION\u00a0 ( MAXRECURSION 0, FORCE ORDER ) ;\r\n<\/pre>\n<p class=\"caption\">Listing 4: The recursive CTE approach<\/p>\n<p>This solution performs reasonably well, generating a million rows in about 13 seconds, which is the best yet. However, I had many false starts before reaching this solution, and in fact temporarily discarded the idea completely because I couldn&#8217;t make it work. On the first couple of very small test runs the code seemed to work, but it crashed when I finally set <span class=\"STCodeinTextChar\">@driver<\/span> to 1,000 to generate my million rows, producing the error message:<\/p>\n<pre>Msg 530, Level 16, State 1, Line 8\r\nThe statement terminated. The maximum recursion 100 has been exhausted before statement completion.\r\n<\/pre>\n<p>At the time, I didn&#8217;t consider the possibility that the recursion level was a soft limit that could be set by a hint and it wasn&#8217;t until a few weeks after my first attempt that I heard about the hint, during a conversation with a SQL Server expert. The recursion level can be set to any value up to 32,767, after which a setting of zero means &#8220;no limit&#8221;.<\/p>\n<p>The recursion level wasn&#8217;t the only little problem I had with the query: on my first attempt to use the inline view I ran into this error:<\/p>\n<pre>Msg 156, Level 15, State 1, Line 1\r\nIncorrect syntax near the keyword 'option'.\r\n<\/pre>\n<p>When I removed the <span class=\"STCodeinTextChar\">option()<\/span> clause, the error changed to:<\/p>\n<pre>Msg 102, Level 15, State 1, Line 32\r\nIncorrect syntax near ';'\r\n<\/pre>\n<p>It took me 15 minutes staring at the text before I noticed that I hadn&#8217;t given the inline view an alias; this is not a requirement in Oracle and it actually causes an Oracle error to be reported in some cases if you do include one. It always seems to be the smallest things that waste the most time when you&#8217;re in an unknown environment!<\/p>\n<div class=\"note\">\n<p class=\"note\"><b>A note on ANSI join support in SQL Server vs. Oracle<\/b><br \/>\nT SQL Server syntax for ANSI joins is far better than Oracle&#8217;s. It was a nasty surprise when I first started to write some ANSI code and found that things like the aliases for inline joins and columns, which SQL Server allows, simply didn&#8217;t work in Oracle.<\/p>\n<\/div>\n<p>Before we move on to the next strategy, you may be wondering why Listing 4 (and my original Oracle source) uses a small recursive CTE with a cross join rather than just using a single large CTE in a statement like that shown in Listing 5 (which gets rid of the <span class=\"STCodeinTextChar\">FORCE ORDER<\/span> hint).<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">WITH\u00a0\u00a0\u00a0 generator\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AS ( SELECT\u00a0\u00a0 1 AS id\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 UNION ALL\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SELECT\u00a0\u00a0 id + 1\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM\u00a0\u00a0\u00a0\u00a0 generator\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHERE\u00a0\u00a0\u00a0 id &lt; @driver\u00a0\u00a0\u00a0 -- set to something BIG\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 )\r\n\u00a0\u00a0\u00a0 INSERT\u00a0 INTO big_table\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SELECT\u00a0 id ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FLOOR(( id - 1 ) \/ @div) ,\r\n\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0( id - 1 ) % @mod ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 STR(id, 10, 0) ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 REPLICATE('x', 100)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM\u00a0\u00a0\u00a0 generator g1\r\n\u00a0\u00a0\u00a0 OPTION\u00a0 ( MAXRECURSION 0 ) ;\r\n<\/pre>\n<p class=\"caption\">Listing 5: A simpler, but much slower, CTE approach<\/p>\n<p>The answer is that recursion is expensive in most coding environments, so I had simply assumed it probably would be in SQL Server 2008. In my Oracle example, the basic cost of generating 1,000,000 rows jumped from 7 seconds to 11 seconds when I switched from a small join to a single use of the CTE; the SQL Server 2008 example jumped from 13 seconds to 48 seconds.<\/p>\n<h1>Strategy 4: A Table-Valued Function<\/h1>\n<p>After temporarily abandoning the recursive CTE approach, I thought I&#8217;d try another &#8220;data-free&#8221; strategy, namely a table function (known in Oracle-speak as a pipelined function). I&#8217;ll start with defining the table function, as shown in Listing 6, and then explain the rationale.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">CREATE FUNCTION table_function ( @i_in AS INTEGER )\r\nRETURNS @table_result TABLE ( id INT )\r\nAS \r\n\u00a0\u00a0\u00a0 BEGIN\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 DECLARE @loop INT = 1 ;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHILE @loop &lt;= @i_in \r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 BEGIN\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 INSERT\u00a0 INTO @table_result\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 VALUES\u00a0 ( @loop ) ;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SET @loop = @loop + 1 ;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 END ;\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 RETURN\r\n\u00a0\u00a0\u00a0 END\r\nGo\r\n<\/pre>\n<p class=\"caption\">Listing 6: The TVF approach<\/p>\n<p>The table function has a return type which, as its name suggests, is a table and I have, in effect, declared a variable of type table and then written a simple loop that inserts rows into this table (in a variant of this test, I used the recursive CTE mechanism to generate the result set for the table function). The table name is <span class=\"STCodeinTextChar\">@table_result<\/span>, which indicates, through the <span class=\"STCodeinTextChar\">@<\/span> symbol, that the table is a form of temporary table.<\/p>\n<div class=\"note\">\n<p class=\"note\"><b>A note on temporary tables<\/b> <br \/>\nOne of the comments on the first article in this series pointed out that there are three different types of temporary table: <span class=\"STCodeinTextChar\">#table_name<\/span> for local scope, <span class=\"STCodeinTextChar\">##table_name<\/span> for global scope, and <span class=\"STCodeinTextChar\">@table_name<\/span> for inner-local scope, so I&#8217;ve got plenty of work to do on checking how they work,what they cost , and whether I&#8217;ve used the right type here.<\/p>\n<\/div>\n<p>To use this function, I simply need to tweak the definition of my CTE, as shown in Listing 7.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">WITH\u00a0\u00a0\u00a0 generator\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AS ( SELECT\u00a0\u00a0 id\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM\u00a0\u00a0\u00a0\u00a0 table_function(@driver)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 )\r\n\u00a0\u00a0\u00a0 INSERT\u00a0 INTO big_table\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SELECT\u00a0 id ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FLOOR(( id - 1 ) \/ @div) ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ( id - 1 ) % @mod ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 STR(id, 10, 0) ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 REPLICATE('x', 100)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM\u00a0\u00a0\u00a0 ( SELECT TOP ( @limit )\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0@driver * ( g1.id - 1 ) + g2.id id\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM\u00a0\u00a0\u00a0\u00a0\u00a0 generator g1\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 CROSS JOIN generator g2\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ) iv\r\n\u00a0\u00a0\u00a0 OPTION\u00a0 ( FORCE ORDER ) ;\r\n<\/pre>\n<p class=\"caption\">Listing 7: Using the TVF with the <span class=\"STCodeinTextBold\">generator<\/span> CTE<\/p>\n<p>This code took 15 seconds to complete, which is 2 seconds longer than the recursive CTE. I had assumed that the difference would be virtually non-existent, but perhaps the 2 seconds is the impact of the temporary table, or a difference due to the memory calls made for a table function. If this wasn&#8217;t just for a quick and dirty data builder I&#8217;d need to find out why the difference was as large as it was. In the meantime, though, I&#8217;ll just put this anomaly on my to-do list.<\/p>\n<p>Again it&#8217;s worth a quick check to see what the impact would be of changing the cross-join into a single call to the table function for 1,000,000 rows: the answer is similar to the example with the recursive CTE; the run time changed from 15 seconds to 48 seconds. Again, we see that building a large object in memory can be very CPU intensive, while the SQL approach, in effect generating and consuming in a continuous stream, is much more efficient.<\/p>\n<h1>Strategy 5: Using a Temp Table with the CTE<\/h1>\n<p>At this point, two things occurred to me. First, I really needed to know how SQL Server was generating the data, so an investigation into generating and reading execution plans had become very important. Secondly, it struck me that I had been thinking too much in terms of Oracle-style solutions.<\/p>\n<p>I had to try to think in terms of what might be a natural &#8220;SQL Server way&#8221; of attacking this problem, and probably there were a lot of SQL Server-style things I should have done before I got to this point (I&#8217;m sure you&#8217;ll let me know!).<\/p>\n<p>Perhaps if I stopped playing around with recursive CTEs and table functions I could do something more efficient. The code in Listing 8 looks very similar to Listing 4 but there is an important intermediate step that I thought might make a difference.<\/p>\n<pre class=\"theme:ssms2012 lang:tsql\">WITH\u00a0\u00a0\u00a0 generator\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 AS ( SELECT\u00a0\u00a0 1 AS id\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 UNION ALL\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SELECT\u00a0\u00a0 id + 1\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM\u00a0\u00a0\u00a0\u00a0 generator\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 WHERE\u00a0\u00a0\u00a0 id &lt; @driver\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 )\r\n\u00a0\u00a0\u00a0 SELECT\u00a0 *\r\n\u00a0\u00a0\u00a0 INTO\u00a0\u00a0\u00a0 #generator\r\n\u00a0\u00a0\u00a0 FROM\u00a0\u00a0\u00a0 generator\r\nOPTION\u00a0 ( MAXRECURSION 0 ) ;\r\n\u00a0\r\nINSERT\u00a0 INTO big_table\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 SELECT\u00a0 id ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FLOOR(( id - 1 ) \/ @div) ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ( id - 1 ) % @mod ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 STR(id, 10, 0) ,\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 REPLICATE('x', 100)\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM\u00a0\u00a0\u00a0 ( SELECT TOP ( @limit )\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 @driver * ( g1.id - 1 ) + g2.id id\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 FROM\u00a0\u00a0\u00a0\u00a0\u00a0 #generator g1\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 CROSS JOIN #generator g2\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ) iv\r\nOPTION\u00a0 ( FORCE ORDER ) ;\r\n\u00a0\r\ngo\r\n<\/pre>\n<p class=\"caption\">Listing 8: Using a temp table with the CTE<\/p>\n<p>The difference is that I&#8217;ve created a temporary table to hold my driving data set. One of the mantras of the Oracle enthusiast is that &#8220;you don&#8217;t need temporary tables&#8221; (it&#8217;s not true of course, but we usually hide them behind CTEs and non-mergeable views). In contrast, I&#8217;d noticed that in SQL Server they are quite openly and commonly used. With this temporary table in place I can be confident (I think!) that I won&#8217;t be re-generating a recursive CTE 1,000 times, or calling a table function 1,000 times. So, if any of my strategies so far was going to be noticeably quicker than the rest, my money was on this one. I was wrong. In fact, this version of the code took slightly longer than all the other versions. I have no idea why, of course, but maybe it&#8217;s just something to do with a slight difference in the way the temporary table is used in this case; time to start learning about XEvents, perhaps?<\/p>\n<h1>Concluding notes<\/h1>\n<p>After a few hours of playing around, I&#8217;ve got a way of generating useful volumes of data. I haven&#8217;t found a mechanism that is quite as fast as Oracle, but the timing is in the same ballpark, and when I got around to adding the <span class=\"STCodeinTextChar\">PRIMARY KEY<\/span> constraint and other indexes, the timing advantage swung the other way.<\/p>\n<p>An interesting point here is that in Oracle, I added the <span class=\"STCodeinTextChar\">PRIMARY<\/span> <span class=\"STCodeinTextChar\">KEY<\/span> constraint with a single command but in SQL Server I had to add a <span class=\"STCodeinTextChar\">NOT NULL<\/span> constraint first. Is this a significant difference? When you know where to look (and I do when I&#8217;m using Oracle) you can uncover all sorts of interesting details. In fact, behind the scenes, adding a <span class=\"STCodeinTextChar\">PRIMARY<\/span> <span class=\"STCodeinTextChar\">KEY<\/span> constraint in Oracle requires two tablescans, one to add the <span class=\"STCodeinTextChar\">NOT NULL<\/span> constraint and one to create the index. So the database goes through the same amount of work as SQL Server, but Oracle is hiding two actions behind one command.<\/p>\n<p>Of the various mechanisms I&#8217;ve tried, I&#8217;m going to use the recursive CTE of strategy 3 in the future, simply because it&#8217;s the most self-contained of the options I&#8217;ve considered.<\/p>\n<p>I still need to sort out a reasonable method of generating pseudo-random data though, and I am curious about the join method that SQL Server has selected for the cross join that I&#8217;ve been using in all my non-procedural examples, so it&#8217;s nearly time to start looking closely at execution plans.<\/p>\n<p>I&#8217;ll finish off with a couple of points about differences and similarities between the million row tables built by Oracle and SQL Server:<\/p>\n<ul>\n<li>In Oracle my table spanned 18,261 blocks (pages) of 8KB while SQL Server spanned only 16,950 pages<br \/>\n<span class=\"STItalic\">Note to self: by default Oracle leaves 10% free space in a block for row lengths to increase on update, does SQL Server do something similar by default?<\/span> <span class=\"STItalic\">(It&#8217;s got a fillfactor for indexes)<\/span><\/li>\n<li>In Oracle, the primary key index spanned 2,087 blocks (again with 10% free space; the equivalent of &#8220;<span class=\"STCodeinTextChar\">fillfactor 90 pad_index off<\/span>&#8221; in SQL Server) while SQL Server spanned 2,228 pages. <span class=\"STItalic\">Note to self: is this because of a different storage strategy, or because of a different free space allocation?<\/span>\n<p>&nbsp;<\/p>\n<\/li>\n<\/ul>\n<p>In the next couple of articles I will be looking more closely at where my data is, and how SQL Server can choose to access it &#8211; because without this type of information it&#8217;s not possible to design an efficient database.<\/p>\n<hr \/>\n<p class=\"note\"><b>If all you want to do is to generate test data in a table incredibly quickly &#8230;.\u00a0<\/b><br \/>\n&#8220;I generated better data in only seconds&#8230;&#8221; Michael Gaertner of Quintech was impressed with SQL Data Generator, a tool for populating databases with realistic test data. <a href=\"http:\/\/www.red-gate.com\/specials\/SQL_Data_Generator\/2010\/experts.htm?utm_source=simpletalk&amp;utm_medium=article&amp;utm_content=JohnDataGen&amp;utm_campaign=sqldatagenerator\">Download your free trial of SQL Data Generator.<\/a><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>A well-known Oracle expert records faithfully his struggles with the  unfamiliar : SQL Server.  He now sets himself the task of creating a table with a million rows of random data.  As one might expect, it is the lack of familiarity with the workarounds and tricks of SQL Server that trips him up.  His journey brings us fresh insights, and a glimpse at the alternative-universe of Oracle.&hellip;<\/p>\n","protected":false},"author":101205,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143525],"tags":[5176,4149,5199,4150,4151],"coauthors":[39048],"class_list":["post-895","post","type-post","status-publish","format-standard","hentry","category-learn","tag-jonathan-lewis","tag-learn-sql-server","tag-oracle-to-sql-server","tag-sql","tag-sql-server"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/895","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/101205"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=895"}],"version-history":[{"count":5,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/895\/revisions"}],"predecessor-version":[{"id":77204,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/895\/revisions\/77204"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=895"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=895"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=895"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=895"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}