{"id":82596,"date":"2018-12-24T18:38:21","date_gmt":"2018-12-24T18:38:21","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=82596"},"modified":"2022-04-24T21:00:00","modified_gmt":"2022-04-24T21:00:00","slug":"saving-data-to-various-formats-using-sql","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/saving-data-to-various-formats-using-sql\/","title":{"rendered":"Saving Data to Various Formats, Using SQL"},"content":{"rendered":"<p>I enjoy pulling the data out of <em>AdventureWorks<\/em>. It is a great test harness. What is the quickest way of doing it? Well, everyone knows it is native BCP, but how much faster is that format than tab-delimited or comma-delimited BCP? Can we quickly output data in XML? Is there a way of outputting array-in-array JSON reasonably quickly? How long does it take to output in MongoDB&#8217;s Extended JSON? Of course, the answer is going to vary from system to system, and across versions, but any data about this is usually welcome.<\/p>\n<p>In addition to these questions, I wanted to know more about how much space these files take up, either raw or zipped. We\u2019re about to find out. We\u2019ll test all that, using good ol\u2019 BCP and SQLCMD.<\/p>\n<p>My motivation for doing this was to explore ways of quickly transferring data to MongoDB. to test out a way of producing array-in-array JSON at a respectable turn of speed. It turned out to be tricky. The easy and obvious ways were slow.<\/p>\n<h2>How? Why?<\/h2>\n<p>The reason I like <em>AdventureWorks<\/em> is that it has a bit of an assault course there. There are illegal column names, reserved words used as columns, CLR types, and free text search. Not as much of this as I\u2019d like, of course, but it makes things a bit more tricky, so it is good for testing, along with old <em>Pubs<\/em> with its awkward obsolete data types.<\/p>\n<p>Because the SQL Server implementation of XML and JSON are rather delicate creatures that quail at the smell of a CLR Datatype, or a column name with an embedded space, we generally need some supporting logic, rather than belt out the tables to file just by <code>SELECT *<\/code>.<\/p>\n<p>As most people know, you can use BCP very simply, just specifying each table. Here we output in Tab-delimited format in utf-8, which is a format that Excel likes. It is safe as long as you don\u2019t include tab characters in your spreadsheet cells! Note that <code>xp_cmdshell<\/code> must be enabled to try this example.<\/p>\n<pre class=\"theme:ssms2012-simple-talk font-size:14 lang:tsql decode:true \">--over-ride this path if you need to.\r\nDECLARE @ourPath sysname = 'OurPath\\TabDelimited\\';\r\nDECLARE @Database sysname = Db_Name(); \r\nDECLARE @command NVARCHAR(4000)= '\r\nEXECUTE xp_cmdshell ''bcp  ?  out '+@ourPath +'?.Tab -c -C 65001 -d'\r\n    +@Database+' -T -S '+@@servername +''', NO_OUTPUT;'\r\nEXECUTE sp_msforeachtable @command<\/pre>\n<p>Native and \u201ccomedy-limited\u201d formats aren\u2019t much harder. Proper CSV format output from the tables of a database is only possible via ODBC.<\/p>\n<h2>Creating the Test Harness<\/h2>\n<p>For this task, I set up several directories on the server for the different types of output<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"668\" height=\"215\" class=\"wp-image-82605\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/12\/word-image-154.png\" \/><\/p>\n<p>\nThen ran the script from The Code section in this article which wrote out the data from <em>AdventureWorks<\/em> into each directory in the different formats. I included within the batch all the necessary temporary procedures along with a routine that switched in the configuration that allowed me to use <code>xp_cmdshell<\/code> and then switched it off again.<\/p>\n<p>Into each directory was written all the files, one for each table, like this.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"350\" height=\"531\" class=\"wp-image-82606\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/12\/word-image-155.png\" \/><\/p>\n<p>One can, and should of course, open these files up and check on them. Here is the Tab-delimited format.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"611\" height=\"611\" class=\"wp-image-82607\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/12\/word-image-156.png\" \/><\/p>\n<p>OK, it would be good not to have the quoted identifiers in the filename <em>[Human Resources]. [Employee].Tab<\/em>, perhaps, but they are legal. If they irritate you, it is very quick and easy to remove them.<\/p>\n<pre class=\"theme:powershell-ise font-size:14 wrap:true lang:ps decode:true \">Get-ChildItem -path 'MyDirectory\u2019 -Filter \"*[*]*.json\" -Recurse |\r\n    Rename-Item -NewName {$_.name -replace '(\\[|\\])',''}   <\/pre>\n<h2>Mixed Usefulness of the Different Formats<\/h2>\n<p>Not all these formats are particularly useful. Tab-delimited is fine just so long as you don\u2019t have tabs in any strings. The comma-separated format isn\u2019t CSV. If you run \u2026<\/p>\n<pre class=\"theme:ssms2012-simple-talk font-size:14 wrap:true wrap-toggle:false lang:tsql decode:true\">SQLCMD -d AdventureWorks2016 -E -u  -s, -W -Q \"set nocount on; SELECT * from (values ('This text has commas, like this, and so should be delimited.','comedy-limited, nice but unreliable'),(',,,,,','This text has no commas'))f(one,two)\"\r\n\r\n<\/pre>\n<p>\u2026 you will see instantly that it hasn\u2019t coped. It isn\u2019t CSV.<\/p>\n<pre class=\"theme:none font-size:12 line-height:16 lang:batch decode:true\">one,two\r\n---,---\r\nThis text has commas, like this, and so should be delimited.,comedy-limited, nice but unreliable\r\n,,,,,,This text has no commas<\/pre>\n<h2>The Results<\/h2>\n<p>The XML via BCP and the native BCP both worked very rapidly. Tab delimited and comma-delimited took a little bit longer. SQLCMD and the object-in-array JSON were slightly laggard but respectable. Extended JSON had a third-more to write than ordinary JSON. Bringing up the rear was the array-in-array-format. I originally tried exporting array-in-array JSON via XML. However, it turned out to be too slow. Although XML is quick to export, the overhead of converting it to JSON is slow. The method of using <code>OpenJSON<\/code> was again prohibitively slow, taking two or three minutes.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-83194\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/12\/JSONOutput.png\" alt=\"\" width=\"361\" height=\"190\" \/><\/p>\n<p>A word of caution with these results. They can vary considerably, even on the same server and general setup.<\/p>\n<p>The performance of array-in-array JSON was as good as I can manage to get. CLR, or even PowerShell could easily do better. I\u2019d prefer to wait for SQL Server to support the format. However, this is only five times as long as the fastest, BCP. It is usable for most purposes.<\/p>\n<p>What about the space they take up?<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-83204\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2018\/12\/Results.png\" alt=\"\" width=\"546\" height=\"304\" \/><\/p>\n<p>The advantage of the array-in-array format of JSON is that the text takes half the space of the object -in-array format. This advantage almost vanishes when the files are zipped, though.<\/p>\n<p>XML and Json Object-in-Array format both take around twice as much space as the simpler tabular formats, which is unsurprising as they are more versatile in the structure of the information they can hold. The large space used by comma-delimited files is due to the fact that I had difficulty storing the file in UTF-8 format. I opted for Unicode instead. This disappeared when the files were zipped. Comma-delimited format was only there for the sake of completion.<\/p>\n<p>All the formats, when zipped, take roughly the same space. This shouldn\u2019t be a surprise because they all held exactly the same information. It is just that some formats repeated it a lot!<\/p>\n<h2>The Code<\/h2>\n<p>So, here is the SQL to run the comparative tests. They take some time to run. Something like three and a half minutes for <em>AdventureWorks<\/em>. You\u2019ll see that we start by hurriedly ensuring that we are allowed to use <code>XP_CmdShell<\/code> and changing the configuration if necessary. Then we run the tests and finally, if necessary, close off the configuration to prevent it being used.<\/p>\n<p>To run this, change the path values to a directory on your server if required. Remember that you are writing these files to your server, not the workstation.<\/p>\n<p>I describe the methodology for the test harness, and how to set it up as a SQL Prompt snippet, in an article <a href=\"https:\/\/www.red-gate.com\/hub\/product-learning\/sql-prompt\/record-t-sql-execution-times-using-sql-prompt-snippet\">\u2018How to record T-SQL execution times using a SQL Prompt snippet\u2019<\/a>.<\/p>\n<p>These routines need to access temporary stored procedures. These are all stored on GitHub, and I\u2019ll keep those updated. They are here, at <a href=\"https:\/\/github.com\/Phil-Factor\/JSONSQLServerRoutines\">JSONSQLServerRoutines<\/a> along with several others that I use for processing JSON. For this example, create the <code>#SaveJsonDataFromTable,\u00a0#SaveExtendedJsonDataFromTable, <\/code>\u00a0and <code>#ArrayInArrayJsonDataFromTable<\/code> temporary stored procedures in the same query window before running the following code.<\/p>\n<pre class=\"theme:ssms2012-simple-talk font-size:14 lang:tsql decode:true\">\/* lets just check the config and make sure that xp_cmdshell is enabled. *\/\r\nDECLARE @Settings TABLE (\r\n    name sysname, minimum INT, Maximum INT, config_value INT, run_value INT);\r\nINSERT INTO @Settings (name, minimum, Maximum, config_value, run_value)\r\n  EXECUTE sp_configure @configname = 'show advanced options';\r\nIF NOT EXISTS (\r\n  SELECT * FROM @Settings WHERE name = 'show advanced options'\r\n  AND run_value = 1)\r\n  BEGIN\r\n    EXECUTE sp_configure 'show advanced options', 1;\r\n    RECONFIGURE;\r\n  END;\r\nINSERT INTO @Settings (name, minimum, Maximum, config_value, run_value)\r\n  EXECUTE sp_configure @configname = 'xp_cmdshell';\r\nIF NOT EXISTS (\r\n  SELECT * FROM @Settings WHERE name = 'xp_cmdshell'\r\n  AND run_value = 1)\r\n  BEGIN\r\n    EXECUTE sp_configure 'xp_cmdshell', 1;\r\n    RECONFIGURE;\r\n  END;\r\ngo\r\n\/* start of timed run *\/\r\nDECLARE @log TABLE \r\n  (TheOrder INT IDENTITY(1,1), \r\n  WhatHappened varchar(200),\r\n  WHENItDid  Datetime2 DEFAULT GETDATE())\r\nUSE adventureworks2016\r\n----start of timing\r\nINSERT INTO @log(WhatHappened) SELECT 'Starting Writing out every table'\r\n--place at the start\r\nDECLARE @ourPath9 sysname = 'C:\\Data\\RawData\\AdventureWorks\\XML\\';\r\nDeclare @command9 NVARCHAR(4000)= \r\n'EXEC xp_cmdshell ''bcp \"SELECT * FROM ? FOR XML AUTO\"  queryout \"'\r\n     +@ourPath9+'?.xml\" -S '+@@Servername+' -T -c -C 65001 -t'', NO_OUTPUT;'\r\nEXECUTE sp_msforeachtable @command9\r\nINSERT INTO @log(WhatHappened) \r\n    SELECT 'Writing out every table as XML  took '\r\n    --log the time taken to get to this point\r\n\/* first lets try using JSON array--in-array format *\/\r\nDECLARE @ourPath1 sysname = 'C:\\Data\\RawData\\AdventureWorks\\JSONArrayInArray\\';\r\nDeclare @command1 NVARCHAR(4000)= '\r\nDECLARE @Json NVARCHAR(MAX)\r\nEXECUTE #ArrayInArrayJsonDataFromTable @TableSpec=''?'',@JSONData=@json OUTPUT\r\nCREATE TABLE ##myTemp (Bulkcol nvarchar(MAX))\r\nINSERT INTO ##myTemp (Bulkcol) SELECT @JSON\r\nEXECUTE xp_cmdshell ''bcp ##myTemp out \"'+@ourPath1\r\n     +'?.JSON\" -c -C 65001 -S '+@@Servername+' -T '', NO_OUTPUT;\r\nDROP TABLE ##myTemp'\r\nEXECUTE sp_msforeachtable @command1\r\nINSERT INTO @log(WhatHappened) \r\nSELECT 'Writing out every table as Array-in-Array JSON took '\r\n--log the time taken to get to this point\r\nDECLARE @ourPath2 sysname = 'C:\\Data\\RawData\\AdventureWorks\\JSONObjectInArray\\';\r\nDECLARE @command2 NVARCHAR(4000)= '\r\nDECLARE @Json NVARCHAR(MAX)\r\nEXECUTE #SaveJsonDataFromTable @TableSpec=''?'',@JSONData=@json OUTPUT\r\nCREATE TABLE ##myTemp (Bulkcol nvarchar(MAX))\r\nINSERT INTO ##myTemp (Bulkcol) SELECT @JSON\r\nEXECUTE xp_cmdshell ''bcp ##myTemp out '+@ourPath2\r\n    +'?.JSON -c -C 65001 -S '+@@Servername+' -T '', NO_OUTPUT;\r\nDROP TABLE ##myTemp'\r\nEXECUTE sp_msforeachtable @command2\r\nINSERT INTO @log(WhatHappened) \r\nSELECT 'Writing out every table as Object-in-Array JSON took '\r\n--log the time taken to get to this point\r\nDECLARE @ourPath3 sysname = 'C:\\Data\\RawData\\AdventureWorks\\TabDelimited\\';\r\nDECLARE @Database3 sysname = Db_Name(); --over-ride this if you need to.\r\nDECLARE @command3 NVARCHAR(4000)= '\r\nEXECUTE xp_cmdshell ''bcp  ?  out '+@ourPath3+'?.Tab -c -C 65001 -d'\r\n    +@Database3+' -S '+@@Servername+' -T '', NO_OUTPUT;'\r\nEXECUTE sp_msforeachtable @command3\r\nINSERT INTO @log(WhatHappened) \r\nSELECT 'Writing out every table as tab-delimited BCP took '\r\n--log the time taken to get to this point\r\nDECLARE @ourPath4 sysname = 'C:\\Data\\RawData\\AdventureWorks\\NativeBCP\\';\r\nDECLARE @Database4 sysname = Db_Name(); --over-ride this if you need to.\r\nDECLARE @command4 NVARCHAR(4000)= '\r\nEXECUTE xp_cmdshell ''bcp  ?  out '+@ourPath4+'?.data -N -d'\r\n    +@Database4+' -S '+@@Servername+' -T '', NO_OUTPUT;'\r\nEXECUTE sp_msforeachtable @command4\r\nINSERT INTO @log(WhatHappened) \r\nSELECT 'Writing out every table as native BCP took '\r\n--log the time taken to get to this point\r\nDECLARE @ourPath6 sysname = 'C:\\Data\\RawData\\AdventureWorks\\CSV\\';\r\nDECLARE @Database6 sysname = Db_Name(); --over-ride this if you need to.\r\nDECLARE @command6 NVARCHAR(4000)= '\r\nEXECUTE xp_cmdshell ''sqlcmd  -d '\r\n    +@Database6+' -u -E -h -1 -s, -W -Q \"set nocount on; SELECT * FROM ?\" -S '\r\n    +@@Servername+' -o '+@ourPath6++'?.CSV'', NO_OUTPUT;'\r\nEXECUTE sp_msforeachtable @command6\r\nINSERT INTO @log(WhatHappened) \r\nSELECT 'Writing out every table as SQLCMD comedy-limited took '\r\n--log the time taken to get to this point\r\n--log the time taken to get to this point\r\nDECLARE @ourPath7 sysname =  'C:\\Data\\RawData\\AdventureWorks\\ExtendedJSON\\';\r\nDeclare @command7 NVARCHAR(4000)= '\r\nDECLARE @Json NVARCHAR(MAX)\r\nEXECUTE #SaveExtendedJsonDataFromTable @TableSpec=''?'',@JSONData=@json OUTPUT\r\nCREATE TABLE ##myTemp (Bulkcol nvarchar(MAX))\r\nINSERT INTO ##myTemp (Bulkcol) SELECT @JSON\r\nEXECUTE xp_cmdshell ''bcp ##myTemp out \"'+@ourPath7\r\n     +'?.JSON\" -c -C 65001 -S '+@@Servername+' -T '', NO_OUTPUT;\r\nDROP TABLE ##myTemp'\r\nEXECUTE sp_msforeachtable @command7\r\nINSERT INTO @log(WhatHappened) \r\nSELECT 'Writing out every table as Extended JSON took '\r\n--log the time taken to get to this point\r\nSELECT ending.whathappened, \r\n    DateDiff(ms, starting.whenItDid,ending.WhenItDid) AS ms \r\nFROM @log starting\r\nINNER JOIN @log ending ON ending.theorder=starting.TheOrder+1\r\nUNION all\r\nSELECT 'Total', DateDiff(ms,Min(WhenItDid),Max(WhenItDid))  \r\nFROM @log \r\nORDER BY ms asc\r\n--list out all the timings\r\ngo\r\nDECLARE @Settings TABLE \r\n    (name sysname, minimum INT, Maximum INT, config_value INT, run_value INT);\r\nINSERT INTO @Settings (name, minimum, Maximum, config_value, run_value)\r\n  EXECUTE sp_configure @configname = 'show advanced options';\r\nIF NOT EXISTS (\r\n  SELECT * FROM @Settings WHERE name = 'show advanced options'\r\n  AND run_value = 1)\r\n  BEGIN\r\n    EXECUTE sp_configure 'show advanced options', 1;\r\n    RECONFIGURE;\r\n  END;\r\nINSERT INTO @Settings (name, minimum, Maximum, config_value, run_value)\r\n  EXECUTE sp_configure @configname = 'xp_cmdshell';\r\nIF NOT EXISTS (\r\n  SELECT * FROM @Settings WHERE name = 'xp_cmdshell'\r\n  AND run_value = 0)\r\n  BEGIN\r\n    EXECUTE sp_configure 'xp_cmdshell', 0;\r\n    RECONFIGURE;\r\n  END; <\/pre>\n<h2>Conclusions<\/h2>\n<p>It is very useful to run tests like these, and I caught several bugs while doing so. However, I much prefer testing things rather than waving my hands and arguing.<\/p>\n<p>What I\u2019d conclude from all this is that JSON is a good way of storing data and isn\u2019t really any bulkier as a format than any other way of doing it. When data is zipped, it seems to boil down to the same size whatever the format you use. Obviously, Native BCP is the quickest and neatest way of \u2018dumping\u2019 data but it isn\u2019t easy to read, and you cannot prepare data from another source and import it as native BCP because it isn\u2019t a data transport medium. If you want an unzipped tabular format with the robustness of JSON and the size of CSV, then JSON Array-in-array is the one to go for.<\/p>\n<p>None of the delimited formats are safe in that it is always possible that column value could contain an unescaped character that is used as a delimiter. CSV, in contrast, would be fine if there were a reliable way of doing it in BCP. If you want a safe way of transporting data, then XML or JSON is an obvious choice but who would use XML when there was an alternative?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>You have many options when exporting data from a database. In this article, Phil Factor compares several methods including XML and array-in-array JSON for speed and file size.&hellip;<\/p>\n","protected":false},"author":154613,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[143531],"tags":[5134],"coauthors":[6813],"class_list":["post-82596","post","type-post","status-publish","format-standard","hentry","category-t-sql-programming-sql-server","tag-sql-prompt"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/82596","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/154613"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=82596"}],"version-history":[{"count":19,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/82596\/revisions"}],"predecessor-version":[{"id":83206,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/82596\/revisions\/83206"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=82596"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=82596"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=82596"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=82596"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}