When you are storing large strings, or any other data types that you’d want to store as VARCHAR or VARBINARY, there are factors that affect how and where that data is stored inside SQL Server. This in turn will affect the performance of inserts and updates.
Basics: Allocation Unit Types, Row and Page Size Limits
SQL Server’s unit of physical storage is a page. Each page has a fixed size of 8K bytes (ref). Where possible, data is stored in rows within these ‘pages’. A row cannot overlap a page boundary. If the space allocated to a datatype is fixed, or if the variable sized data is small enough to allow the row to fit, the base allocation unit type, “IN_ROW_DATA”, is used. However, there are two ways of storing data that would otherwise overflow a page boundary.
- When a single row grows too large to be stored in-row, data can be offloaded to “ROW_OVERFLOW_DATA” pages.
- When a single column stores more than 8,000 bytes, or if the developer chooses to force this behavior, data is stored in “LOB_DATA” pages.
This becomes much clearer through examples.
Exercising Row and Page Size Limits
Single String Column with VARCHAR(n) and VARCHAR (MAX)
8,000 Characters
First, let’s build two tables and exercise some long strings to see how SQL Server handles both normal and large strings. Here, we will be exercising row size limits.
1 2 3 4 5 6 7 8 |
CREATE TABLE demo.OneColVarcharN (Col1 VARCHAR(8000)); CREATE TABLE demo.OneColVarcharMax (Col1 VARCHAR(MAX)); INSERT INTO demo.OneColVarcharN (Col1) SELECT REPLICATE('x', 8000); INSERT INTO demo.OneColVarcharMax (Col1) SELECT REPLICATE('x', 8000); |
After each insert, we will use versions of the following query to see into which type of page these records were loaded (see MSDN for details on the DMV):
1 2 3 4 5 6 7 8 9 10 |
SELECT OBJECT_NAME([object_id]) AS TableName, alloc_unit_type_desc AS AllocUnitTp, page_count AS PgCt, avg_page_space_used_in_percent AS AvgPgSpcUsed, record_count AS RcdCt, min_record_size_in_bytes AS TableName,, max_record_size_in_bytes AS MaxRcdSz, forwarded_record_count AS FwdRcdCt FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL , 'DETAILED'); |
The query results show that only “IN_ROW_DATA” pages were written for both tables, and the pages are 98.97% full.
TableName | AllocUnitTp | PgCt | AvgPgSpcUsed | RcdCt | MinRcdSz | MaxRcdSz | FwdRcdCt |
OneColVarcharMax | IN_ROW_DATA | 1 | 98.97 | 1 | 8011 | 8011 | 0 |
OneColVarcharN | IN_ROW_DATA | 1 | 98.97 | 1 | 8011 | 8011 | 0 |
8,001 Characters
Clearly we can’t insert 8,001 characters into an 8,000-character column, but we can in the VARCHAR(MAX) column.
1 2 |
INSERT INTO demo.OneColVarcharMax (Col1) SELECT REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 8001); |
The query results here show that pages were written to both “IN_ROW_DATA” and “LOB_DATA” pages.
TableName | AllocUnitTp | PgCt | AvgPgSpcUsed | RcdCt | MinRcdSz | MaxRcdSz | FwdRcdCt |
OneColVarcharMax | IN_ROW_DATA | 1 | 0.43 | 1 | 35 | 35 | 0 |
OneColVarcharMax | LOB_DATA | 1 | 99.02 | 1 | 8015 | 8015 | NULL |
Note that the “in row” record is very small. In this case, the only thing stored “in row” is the 24-byte pointer. The full 8,001-character string is moved to the LOB page.
So SQL Server is storing normal VARCHAR(n) columns and VARCHAR(MAX) columns “in row” by default. When VARCHAR(MAX) exceeds 8,000 characters, the pointer is stored “in row”, and the string is stored in “LOB” pages.
Two String Columns with VARCHAR (n) and VARCHAR(MAX)
Let’s try this again but with two columns. This time, we will exercise page size limits.
8,000 Characters
As expected, two 4,000-character strings fit fine in both tables.
1 2 3 4 5 6 7 8 |
CREATE TABLE demo.TwoColVarcharN (Col1 VARCHAR(8000), Col2 VARCHAR(8000)); CREATE TABLE demo.TwoColVarcharMax (Col1 VARCHAR(MAX), Col2 VARCHAR(MAX)); INSERT INTO demo.TwoColVarcharN (Col1, Col2) SELECT REPLICATE('x', 4000), REPLICATE('x', 4000); INSERT INTO demo.TwoColVarcharMax (Col1, Col2) SELECT REPLICATE('x', 4000), REPLICATE('x', 4000); |
The query results show that only “IN_ROW_DATA” pages are written for each table, and the pages are 99.00% full.
8,060 Characters
The maximum number of bytes per page is 8,060. Let’s see how this works.
It actually doesn’t require the full 8,060 characters to fill that page. 8,047 characters get written to just “in row” pages, but at 8,048 characters, it writes to both “in row” pages and either “row overflow” pages (VARCHAR(n)) or “LOB” pages (VARCHAR(MAX)).
Results with 8,047 Characters (4,023 and 4,024 characters)
TableName | AllocUnitTp | PgCt | AvgPgSpcUsed | RcdCt | MinRcdSz | MaxRcdSz | FwdRcdCt |
TwoColVarcharMax | IN_ROW_DATA | 1 | 99.58 | 1 | 8060 | 8060 | 0 |
TwoColVarcharN | IN_ROW_DATA | 1 | 99.58 | 1 | 8060 | 8060 | 0 |
Results with 8,048 Characters (4,024 and 4,024 characters)
TableName | AllocUnitTp | PgCt | AvgPgSpcUsed | RcdCt | MinRcdSz | MaxRcdSz | FwdRcdCt |
TwoColVarcharMax | IN_ROW_DATA | 1 | 50.17 | 1 | 4061 | 4061 | 0 |
TwoColVarcharMax | LOB_DATA | 1 | 49.89 | 1 | 4038 | 4038 | NULL |
TwoColVarcharN | IN_ROW_DATA | 1 | 50.17 | 1 | 4061 | 4061 | 0 |
TwoColVarcharN | LOB_DATA | 1 | 49.98 | 1 | 4038 | 4038 | NULL |
Notice that at 8,047 characters, the record size is exactly 8,060 bytes. What is in that 23 bytes? “Each data page contains a page header which stores page meta-data, such as database file identifier, current page number, previous and next page numbers, number of free bytes per page and so forth.” (ref)
200,000 Characters
For the VARCHAR(MAX) table, the results are basically the same even if the column width is pushed far past the 8,000-character limit. Here, we write two 100,000-character strings.
1 2 3 4 |
INSERT INTO demo.TwoColVarcharMax (Col1, Col2) SELECT REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 100000), REPLICATE(CONVERT(VARCHAR(MAX), 'x'), 100000); |
Results
TableName | AllocUnitTp | PgCt | AvgPgSpcUsed | RcdCt | MinRcdSz | MaxRcdSz | FwdRcdCt |
TwoColVarcharMax | IN_ROW_DATA | 1 | 0.75 | 1 | 61 | 61 | 0 |
TwoColVarcharMax | LOB_DATA | 27 | 91.89 | 28 | 228 | 8054 | NULL |
Here, we have a single page storing two pointers and 27 pages storing the large strings.
In-Row vs. Out-of-Row LOB Storage
For “large-value data types” like VARCHAR(MAX), SQL Server allows the data to be stored in-row (up to 8,000 bytes) or out-of-row (ref).
We will use two new tables, one in-row and one out-of-row:
1 2 |
CREATE TABLE demo.OneColVarcharMaxIn (Col1 VARCHAR(MAX)); CREATE TABLE demo.OneColVarcharMaxOut (Col1 VARCHAR(MAX)); |
To change a VARCHAR(MAX) column from the default (in row) to out-of-row, execute this statement:
1 |
EXEC sp_tableoption 'demo.OneColVarcharMaxOut', 'large value types out of row', 1; |
1 Character
Writing a single-character string to each column in these tables creates an interesting result:
TableName | AllocUnitTp | PgCt | AvgPgSpcUsed | RcdCt | MinRcdSz | MaxRcdSz | FwdRcdCt |
OneColVarcharMaxIn | IN_ROW_DATA | 1 | 0.15 | 1 | 12 | 12 | 0 |
OneColVarcharMaxOut | IN_ROW_DATA | 1 | 0.33 | 1 | 27 | 27 | 0 |
OneColVarcharMaxOut | LOB_DATA | 1 | 1.04 | 1 | 84 | 84 | NULL |
The in-row table created just one page. The out-of-row page created one page for the pointer and one page for the out-of-row string. Notice the byte count difference: the in-row record takes 12 bytes, where the out-of-row record takes 111 bytes.
This illustrates two downsides of out-of-row storage when small records are written:
- Multiple pages are affected regardless of the size of the data being written.
- Extra storage space is required.
8,000 Characters
TableName | AllocUnitTp | PgCt | AvgPgSpcUsed | RcdCt | MinRcdSz | MaxRcdSz | FwdRcdCt |
OneColVarcharMaxIn | IN_ROW_DATA | 1 | 98.97 | 1 | 8011 | 8011 | 0 |
OneColVarcharMaxOut | IN_ROW_DATA | 1 | 0.33 | 1 | 27 | 27 | 0 |
OneColVarcharMaxOut | LOB_DATA | 2 | 50.02 | 2 | 84 | 8014 | NULL |
Notice that the out-of-row table writes two pages instead of just one (“PgCt” column). It actually takes only 65 characters in the out-of-row table to cause it to create the second page.
8,001 Characters
TableName | AllocUnitTp | PgCt | AvgPgSpcUsed | RcdCt | MinRcdSz | MaxRcdSz | FwdRcdCt |
OneColVarcharMaxIn | IN_ROW_DATA | 1 | 0.43 | 1 | 35 | 35 | 0 |
OneColVarcharMaxIn | LOB_DATA | 1 | 99.02 | 126 | 8015 | 8015 | NULL |
OneColVarcharMaxOut | IN_ROW_DATA | 1 | 0.33 | 1 | 27 | 27 | 0 |
OneColVarcharMaxOut | LOB_DATA | 2 | 98.43 | 127 | 84 | 8015 | NULL |
Once the byte count goes over 8,000, the in-row table writes to a LOB page.
1,000,000 Characters
TableName | AllocUnitTp | PgCt | AvgPgSpcUsed | RcdCt | MinRcdSz | MaxRcdSz | FwdRcdCt |
OneColVarcharMaxIn | IN_ROW_DATA | 1 | 0.43 | 1 | 35 | 35 | 0 |
OneColVarcharMaxIn | LOB_DATA | 126 | 98.42 | 126 | 2020 | 8054 | NULL |
OneColVarcharMaxOut | IN_ROW_DATA | 1 | 0.33 | 1 | 27 | 27 | 0 |
OneColVarcharMaxOut | LOB_DATA | 126 | 98.43 | 127 | 84 | 8054 | NULL |
With such large strings, the results are very similar between default (in-row) and out-of-row settings.
Performance
So how do these various choices affect performance? Let’s code some inserts and record results with different settings and values.
INSERT – 0 to 8,000 Characters
First, we’ll test the performance of a process that inserts strings which fit in VARCHAR(n).
The basic idea here is to insert 10,000 rows of data and record the timing. I ran the tests on my desktop. This is not a server-class machine but is a very high-powered desktop. All tests were executed on SQL Server 2008 R2 and Windows 7 Enterprise.
The results here show microseconds per insert for record lengths between 0 and 8,000 characters. The complete collection of test scripts is linked in the References section at the end of the article.
The results here show that VARCHAR(n) is slightly faster than VARCHAR(MAX) (in-row) only for 0-character and 1,000-character strings. At 2,000 characters or above, VARCHAR(MAX) (in-row) is fastest. This result was not expected.
Intuitively, we expect that VARCHAR(n) should be faster than VARCHAR(MAX) (in-row), and in some cases it is. What I think this shows is that, if there is a performance difference, it is small enough that we can’t measure it consistently.
See detailed results in Table 1.
INSERT – Greater Than 8,000 Characters
These tests are limited to VARCHAR(MAX) but compare in-row to out-of-row.
The results here show microseconds per insert for records lengths between 9,000 and 30,000 characters.
UPDATE – Column Limit
When a VARCHAR(MAX) in-row column goes over the 8,000-character limit, the data is moved to a LOB_DATA page. The linked script Allocation Units – UPDATE.sql tests this 8,000-, then 8,001-, then 8,000-character data shift. This test illustrates the move of data from an IN_ROW_DATA page to a LOB_DATA page and then back for the in-row table.
These operations can be expensive, but how expensive? Make a quick mental prediction before reading on.
The linked script PerfTest – UPDATE.sql tests both the 8,000-to-8,001 scenario and the 8,001-to-8,000 scenario. The results were actually quite surprising. I kept checking the code to see if I had done something wrong because the results didn’t match my expectations at all.
UPDATE Test Results
In/Out |
From |
To |
Elapsed Mcs |
Penalty vs Out-of-Row |
In-Row |
8,000 |
8,001 |
34,238,839 |
82% |
Out-of-row |
8,000 |
8,001 |
18,804,768 |
|
In-Row |
8,001 |
8,000 |
9,969,989 |
-47% |
Out-of-row |
8,001 |
8,000 |
18,804,768 |
As expected, the 8,000-to-8,001 update takes longer for in-row than for out-of-row. The surprise is that the 8,001-to-8,000 update is faster for in-row. Based on these results, there must be a shortcut employed inside SQL Server that optimizes this move of data from a LOB_DATA page to an IN_ROW_DATA page.
UNICODE
All of these tests were conducted using non-UNICODE strings. Without compression, all the byte counts have to be doubled for UNICODE strings.
Recommendations
Don’t Use VARCHAR(MAX) Everywhere
One design-simplification technique that has been suggested is to use VARCHAR(MAX) for every string column. Though the tests here show that using VARCHAR(MAX) (in-row) instead of VARCHAR(n) is not a performance problem for strings under 8,000 characters, there are a number of reasons to limit the length of strings in your database.
UI Issues
Anyone with experience creating an application UI knows that handling long strings is difficult. When laying out the UI for an application, it is important to know the expected and maximum length of strings that will be displayed. Leaving this open to whatever someone writes to the database makes designing, building, and testing an application very difficult.
Performance – Writing or Updating Long Strings
From the performance results, you can see that writing long strings definitely affects write delays. While writing a record of any size (even zero characters) takes time, writing longer strings takes more time. As an example, using VARCHAR(n), writing 1,000 characters takes an average of 217 microseconds while writing 8,000 characters takes an average of 448 microseconds.
Compression
What I believe is the most compelling reason to avoid over-8,000-character strings is compression. LOB data never gets compressed.
“When considering data compression one very key thing to remember is that out of row data (LOB data) isn’t compressed. If the LOB data is stored in row, then it will be compressed, but only when you compress the table with PAGE level compression. If however you use ROW level compression then the LOB data will never be compressed, no matter if it is stored in row or if it is stored out of row.” (ref)
Conclusions
- Where appropriate, use VARCHAR(n) over VARCHAR(MAX)
- for reasons of good design if not performance benefits, and
- because VARCHAR(MAX) data does not compress
- Storing large strings takes longer than storing small strings.
- Updating an in-row VARCHAR(MAX) value from below 8,000 to over 8,000 will be relatively slow, but the difference for a single transaction will likely not be measurable.
- Updating an in-row VARCHAR(MAX) value from over 8,000 to below 8,000 will be faster than if the table is set to store data out-of-row.
- Using the out-of-row option for VARCHAR(MAX) will cause slower writes until the strings are very long.
All the scripts used here are attached, so you are encouraged to retest the results.
References
- Row-Overflow Data Exceeding 8 KB from MSDN
- Using Large-Value Data Types from MSDN
- SQL Server 2005 Large Value Data Types by Gregory A. Larsen at Database Journal
- Troubleshooting SQL Server Storage Problems by Denny Cherry at SQL Server Magazine
- Understanding Pages and Extents from MSDN
- Script: Allocation Units – Basics.sql
- Script: Allocation Units – Binary.sql
- Script: PerfTest – INSERT.sql
- Script: PerfTest – UPDATE.sql
Appendix
Table 1
TableName |
MaxRecords |
StrLen |
ElapsedMcs |
VarcharMaxIn |
10,000 |
0 |
1,341,626 |
VarcharMaxOut |
10,000 |
0 |
1,950,038 |
VarcharN |
10,000 |
0 |
1,216,823 |
VarcharMaxIn |
10,000 |
1,000 |
2,246,443 |
VarcharMaxOut |
10,000 |
1,000 |
2,511,649 |
VarcharN |
10,000 |
1,000 |
2,168,441 |
VarcharMaxIn |
10,000 |
2,000 |
2,308,844 |
VarcharMaxOut |
10,000 |
2,000 |
3,104,460 |
VarcharN |
10,000 |
2,000 |
2,823,654 |
VarcharMaxIn |
10,000 |
3,000 |
2,667,651 |
VarcharMaxOut |
10,000 |
3,000 |
3,057,659 |
VarcharN |
10,000 |
3,000 |
2,698,852 |
VarcharMaxIn |
10,000 |
4,000 |
2,839,255 |
VarcharMaxOut |
10,000 |
4,000 |
3,307,264 |
VarcharN |
10,000 |
4,000 |
3,010,858 |
VarcharMaxIn |
10,000 |
5,000 |
3,229,262 |
VarcharMaxOut |
10,000 |
5,000 |
3,525,668 |
VarcharN |
10,000 |
5,000 |
3,603,669 |
VarcharMaxIn |
10,000 |
6,000 |
3,416,466 |
VarcharMaxOut |
10,000 |
6,000 |
4,056,078 |
VarcharN |
10,000 |
6,000 |
3,712,871 |
VarcharMaxIn |
10,000 |
7,000 |
3,728,472 |
VarcharMaxOut |
10,000 |
7,000 |
4,602,088 |
VarcharN |
10,000 |
7,000 |
4,056,078 |
VarcharMaxIn |
10,000 |
8,000 |
4,321,284 |
VarcharMaxOut |
10,000 |
8,000 |
4,945,295 |
VarcharN |
10,000 |
8,000 |
4,477,286 |
Table 2
TableName |
MaxRecords |
StrLen |
ElapsedMcs |
VarcharMaxIn |
10,000 |
9,000 |
5,670,605 |
VarcharMaxOut |
10,000 |
9,000 |
6,326,708 |
VarcharMaxIn |
10,000 |
10,000 |
5,295,689 |
VarcharMaxOut |
10,000 |
10,000 |
5,280,067 |
VarcharMaxIn |
10,000 |
11,000 |
5,717,469 |
VarcharMaxOut |
10,000 |
11,000 |
5,123,852 |
VarcharMaxIn |
10,000 |
12,000 |
6,014,277 |
VarcharMaxOut |
10,000 |
12,000 |
5,623,740 |
VarcharMaxIn |
10,000 |
13,000 |
6,514,166 |
VarcharMaxOut |
10,000 |
13,000 |
6,686,002 |
VarcharMaxIn |
10,000 |
14,000 |
6,576,651 |
VarcharMaxOut |
10,000 |
14,000 |
6,873,460 |
VarcharMaxIn |
10,000 |
15,000 |
6,764,110 |
VarcharMaxOut |
10,000 |
15,000 |
6,748,488 |
VarcharMaxIn |
10,000 |
16,000 |
6,639,137 |
VarcharMaxOut |
10,000 |
16,000 |
7,154,647 |
VarcharMaxIn |
10,000 |
17,000 |
7,279,619 |
VarcharMaxOut |
10,000 |
17,000 |
7,420,213 |
VarcharMaxIn |
10,000 |
18,000 |
7,435,834 |
VarcharMaxOut |
10,000 |
18,000 |
7,089,147 |
VarcharMaxIn |
10,000 |
19,000 |
7,804,550 |
VarcharMaxOut |
10,000 |
19,000 |
7,070,923 |
VarcharMaxIn |
10,000 |
20,000 |
7,336,277 |
VarcharMaxOut |
10,000 |
20,000 |
8,023,077 |
VarcharMaxIn |
10,000 |
21,000 |
8,538,178 |
VarcharMaxOut |
10,000 |
21,000 |
7,976,250 |
VarcharMaxIn |
10,000 |
22,000 |
7,679,677 |
VarcharMaxOut |
10,000 |
22,000 |
8,007,468 |
VarcharMaxIn |
10,000 |
23,000 |
7,617,241 |
VarcharMaxOut |
10,000 |
23,000 |
7,788,941 |
VarcharMaxIn |
10,000 |
24,000 |
7,835,768 |
VarcharMaxOut |
10,000 |
24,000 |
8,194,778 |
VarcharMaxIn |
10,000 |
25,000 |
8,475,741 |
VarcharMaxOut |
10,000 |
25,000 |
8,506,960 |
VarcharMaxIn |
10,000 |
26,000 |
8,819,141 |
VarcharMaxOut |
10,000 |
26,000 |
8,382,087 |
VarcharMaxIn |
10,000 |
27,000 |
9,084,496 |
VarcharMaxOut |
10,000 |
27,000 |
8,881,578 |
VarcharMaxIn |
10,000 |
28,000 |
8,975,233 |
VarcharMaxOut |
10,000 |
28,000 |
9,053,278 |
VarcharMaxIn |
10,000 |
29,000 |
9,505,942 |
VarcharMaxOut |
10,000 |
29,000 |
8,944,014 |
VarcharMaxIn |
10,000 |
30,000 |
9,100,105 |
VarcharMaxOut |
10,000 |
30,000 |
9,022,060 |
Load comments