{"id":70929,"date":"2017-05-05T15:09:32","date_gmt":"2017-05-05T15:09:32","guid":{"rendered":"https:\/\/www.simple-talk.com\/?p=70929"},"modified":"2021-09-29T16:21:15","modified_gmt":"2021-09-29T16:21:15","slug":"visual-checks-data-distributed-sql-server","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/visual-checks-data-distributed-sql-server\/","title":{"rendered":"Visual Checks on How Data is Distributed in SQL Server"},"content":{"rendered":"<p>This article is all about how to go about looking at a visual representation of the way that data is distributed in tables, in a quick unpolished way, while you are programming with SSMS or whatever you prefer to use.<\/p>\n<p>The way that the values of your numerical data is distributed is important. It will often tell <a href=\"https:\/\/en.wikipedia.org\/wiki\/Seven_Basic_Tools_of_Quality\">you if some data is missing, or incorrect<\/a>. It will probably tell you if it has been faked, If you are a database guy, it will tell you how valuable the data would be as an index, or how reusable your cached query plans will be. If you are creating data you can check to see if it has the same distribution as your production data.<\/p>\n<p>You use a histogram to do this. This looks like a series of contiguous rectangles. The independent variable is plotted along the base (x) axis and frequencies on the vertical (y) axis. A histogram depicts the frequencies of observations occurring in certain contiguous ranges of values. These ranges are usually equal.<\/p>\n<p>For this type of histogram that calculates by range, the variable is a \u2018continuous\u2019 rather than a \u2018discrete\u2019 variable. For a discrete variable, aggregating the values into continuous pockets or ranges makes no sense. Suppose we are classifying homes in a particular town according to the number of school-age children living in the house: then we should get a frequency distribution in which the independent variable would take values of 0, 1, 2, 3, 4, etc., according to the number of children living there. This data is exact: we cannot, in science, have less or more than a whole child. This sort of variable is\u2019discrete\u2019, as is any variable which can take only certain restricted values. You are likely to represent them by integers. Continuous variables tend to represent values of something that changes in a continuous way, such as the temperature of an oven, or the weight of sheep.<\/p>\n<p>On the very nature of this distribution lies the assumptions that you can make about the data, such as the level of probability that two samples come from the same population. It is generally very important to know the way that your data is spread. When a distribution of numerical data is organized, you just create a number of evenly-spanned groups or ranges and see how many of your data values fall into each group. You usually order them from the smallest to largest, and then put into graphs and charts to examine them.<\/p>\n<p>Let\u2019s create a very simple table with a single column containing a floating point number. It could be temperature readings people\u2019s height or product sales. Who knows?<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">IF OBJECT_ID('tempdb..#RandomNumbers') IS NOT NULL\r\n  DROP TABLE #RandomNumbers;\r\n  CREATE TABLE #RandomNumbers (number FLOAT); --a simple table of numbers<\/pre>\n<p>Now, we\u2019ll put in some data that is normally distributed.<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">DECLARE @Mean FLOAT; --the mean (and median if normally distributed)\r\n  DECLARE @StandardDeviation FLOAT; --(the Standard deviation you want)\r\n  DECLARE @ii INT; --counter\r\n  DECLARE @TotalWanted INT; --counter\r\n  SELECT @Mean = 50, @StandardDeviation = 20, @ii = 1, @TotalWanted = 100000;\r\n  SET NOCOUNT ON;\r\n  \/* now we generate the numbers we want, with the mean and standard deviation we want *\/\r\n  WHILE @ii &lt;= @TotalWanted\r\n    BEGIN\r\n    INSERT INTO #RandomNumbers\r\n      (number)\r\n      SELECT ((RAND() * 2 - 1) + (RAND() * 2 - 1) + (RAND() * 2 - 1))\r\n             * @StandardDeviation + @Mean;\r\n    SELECT @ii = @ii + 1;\r\n    END;<\/pre>\n<p>We now do a very simple query to show the distribution. We will expect to see a bell-jar. We all do this just to get a rough idea of the distribution of the data.<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">SELECT REPLICATE(N'\u2588',COUNT(*)\/1000)\r\n  FROM #RandomNumbers \r\n  GROUP BY CONVERT(INT,FLOOR(number))\/5\r\n  ORDER BY CONVERT(INT,FLOOR(number))\/5<\/pre>\n<p>\u2026 giving this in the SSMS result pane\u2026.<\/p>\n<pre class=\"\">  \u2588\r\n  \u2588\u2588\r\n  \u2588\u2588\r\n  \u2588\u2588\u2588\u2588\r\n  \u2588\u2588\u2588\u2588\u2588\r\n  \u2588\u2588\u2588\u2588\u2588\u2588\u2588\r\n  \u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\r\n  \u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\r\n  \u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\r\n  \u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\r\n  \u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\r\n  \u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\r\n  \u2588\u2588\u2588\u2588\u2588\u2588\u2588\r\n  \u2588\u2588\u2588\u2588\u2588\r\n  \u2588\u2588\u2588\u2588\r\n  \u2588\u2588\r\n  \u2588\r\n  \u2588<\/pre>\n<p>This bell jar is stuck to the wall. Of course it took me a short while to work out the number of pockets to use and the best divisor for the number of block characters to use for each bar. There is another problem. If there is no data in one of the pockets, it doesn\u2019t show up, because the GROUP BY does not know of the existence of the pocket.<\/p>\n<p>You could throw up your hands and paste the data into Excel, use R, or export the data into a file and use Gnuplot. With the data sizes we are working with, that soon becomes less practical. Also, you often need a quick answer in whatever application you are using to access the data, such as SSMS.<\/p>\n<p>SQL Server actually maintains distribution histograms for indexes and columns. They are, however, specialised for their purpose of predicting the number of results that will be returned from a query. The query optimizer doesn\u2019t maintain fixed ranges or pockets, but adjusts the individual histogram steps to minimize the number of steps in the histogram, up to 200 of them, whilst maximizing the difference between the boundary values. In short, they are useless for our immediate purpose, because it provides only an estimate, and it requires quite a lot of calculation to get a value for a range.<\/p>\n<p>Here is a rather more robust SQL technique that takes the independent variable and converts it in such a way that it can be represented within a matrix of forty (x) by ten (y) rectangles, represented by the space, block and half-block character. Here it is plotting the histogram for our random data.<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true\">-- declare all the necessary local variables.\r\n  DECLARE @variable TABLE(value INT);\r\n  DECLARE @Maximum NUMERIC(18,2), @Minimum NUMERIC(18,2) , @MaxIndividualCount INT;\r\n  SELECT @Maximum = MAX(number), @minimum = MIN(number) FROM RandomNumbers;\r\n  --we convert the values to integers, with an origin of zero and range of 40\r\n  INSERT INTO @variable (value)\r\n    SELECT CONVERT(INT, (number - @minimum) \/ (@Maximum - @minimum) * 40.00)\r\n    FROM RandomNumbers;\r\n    -- 40 represents the number of ranges. We adjust to a zero-based origin\r\n    -- to make the plotting easier\r\n    -- we need to know the value of the largest pocket (the highest frequency)\r\n  SELECT @MaxIndividualCount = MAX(f.IndividualCount)\r\n    FROM\r\n      (SELECT COUNT(*) AS IndividualCount\r\n         FROM @variable\r\n         GROUP BY [@variable].value\r\n      ) f;\r\n  SELECT CASE WHEN y=10 THEN CONVERT(CHAR(8),@MaxIndividualCount)+N'\u2502' --we add in the highest frequency \r\n  \t\t\tWHEN y=3 THEN CONVERT(CHAR(8),@minimum)+N'\u2502' -- the top of the Y axis\r\n  \t\t\tWHEN y=2 THEN N' \u2026to\u2026   \u2502' --and we also add the maximum and minimum values\r\n  \t\t\tWHEN y=1 THEN CONVERT(CHAR(8),@Maximum)+N'\u2502' \r\n  \t\t\tELSE N'        \u2502'end+\r\n  --SELECT '      \u2502'+\r\n  MAX(CASE WHEN x=1 AND  filled=1 THEN N'\u2588' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=2 AND  filled=1 THEN N'\u2588' WHEN x=2  AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=3 AND  filled=1 THEN N'\u2588' WHEN x=3  AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=4 AND  filled=1 THEN N'\u2588' WHEN x=4  AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=5 AND  filled=1 THEN N'\u2588' WHEN x=5  AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=6 AND  filled=1 THEN N'\u2588' WHEN x=6  AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=7 AND  filled=1 THEN N'\u2588' WHEN x=7  AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=8 AND  filled=1 THEN N'\u2588' WHEN x=8  AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=9 AND  filled=1 THEN N'\u2588' WHEN x=9  AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=10 AND filled=1 THEN N'\u2588' WHEN x=10 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=11 AND filled=1 THEN N'\u2588' WHEN x=11 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=12 AND filled=1 THEN N'\u2588' WHEN x=12 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=13 AND filled=1 THEN N'\u2588' WHEN x=13 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=14 AND filled=1 THEN N'\u2588' WHEN x=14 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=15 AND filled=1 THEN N'\u2588' WHEN x=15 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=16 AND filled=1 THEN N'\u2588' WHEN x=16 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=17 AND filled=1 THEN N'\u2588' WHEN x=17 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=18 AND filled=1 THEN N'\u2588' WHEN x=18 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=19 AND filled=1 THEN N'\u2588' WHEN x=19 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=20 AND filled=1 THEN N'\u2588' WHEN x=20 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=21 AND filled=1 THEN N'\u2588' WHEN x=21 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=22 AND filled=1 THEN N'\u2588' WHEN x=22 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=23 AND filled=1 THEN N'\u2588' WHEN x=23 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=24 AND filled=1 THEN N'\u2588' WHEN x=24 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=25 AND filled=1 THEN N'\u2588' WHEN x=25 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=26 AND filled=1 THEN N'\u2588' WHEN x=26 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=27 AND filled=1 THEN N'\u2588' WHEN x=27 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=28 AND filled=1 THEN N'\u2588' WHEN x=28 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=29 AND filled=1 THEN N'\u2588' WHEN x=29 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=30 AND filled=1 THEN N'\u2588' WHEN x=30 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=31 AND filled=1 THEN N'\u2588' WHEN x=31 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=32 AND filled=1 THEN N'\u2588' WHEN x=32 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=33 AND filled=1 THEN N'\u2588' WHEN x=33 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=34 AND filled=1 THEN N'\u2588' WHEN x=34 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=35 AND filled=1 THEN N'\u2588' WHEN x=35 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=36 AND filled=1 THEN N'\u2588' WHEN x=36 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=37 AND filled=1 THEN N'\u2588' WHEN x=37 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=38 AND filled=1 THEN N'\u2588' WHEN x=38 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=39 AND filled=1 THEN N'\u2588' WHEN x=39 AND  filled=2 THEN N'\u2584' ELSE N' ' END)+\r\n  MAX(CASE WHEN x=40 AND filled=1 THEN N'\u2588' WHEN x=40 AND  filled=2 THEN N'\u2584' ELSE N' ' END)\r\n  FROM \r\n  (\r\n   SELECT x,y, --we work out whether the cell has data or not (1-filled, 2 means around half)\r\n  \t\tCASE WHEN f.frequency*10.00\/@MaxIndividualCount&gt;CONVERT(NUMERIC(8,2),y) THEN 1\r\n  \t\t     WHEN f.frequency*10.00\/@MaxIndividualCount&gt;CONVERT(NUMERIC(8,2),y-0.5) THEN 2\r\n  \t\t     ELSE 0 END AS filled\r\n   FROM\r\n  (VALUES ( 1), (2), (3), (4), (5), (6), (7), (8), (9), (10))OneToTen(y)--the Y axis (inverted!)\r\n  CROSS join\r\n  (SELECT OneToForty.value AS x, SUM(CASE WHEN g.value IS NOT NULL THEN 1 ELSE 0 END )AS frequency\t\r\n  FROM \r\n  (VALUES --the range of each 'pocket'\r\n  (1), (2), (3), (4), (5), (6), (7), (8), (9), (10),( 11), (12), (13), (14), (15), (16), (17), (18), (19), (20),\r\n  (21), (22), (23), (24), (25), (26), (27), (28), (29), (30),( 31), (32), (33), (34), (35), (36), (37), (38), (39), (40)\r\n  )OneToForty(value)\r\n  LEFT OUTER JOIN\r\n  \t @variable g\r\n  ON g.value=OneToForty.value\r\n  GROUP BY OneToForty.value)f\r\n  )g\r\n  GROUP BY y\r\n  ORDER BY y DESC<\/pre>\n<p>Typically, if your data is normally distributed, you\u2019ll get histograms like this appearing in the results pane ..<\/p>\n<pre class=\"\">  5600    \u2502                 \u2584\u2584\u2584\u2584                   \r\n          \u2502              \u2584\u2584\u2588\u2588\u2588\u2588\u2588\u2588\u2584                 \r\n          \u2502             \u2584\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2584               \r\n          \u2502            \u2584\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2584              \r\n          \u2502           \u2584\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2584             \r\n          \u2502          \u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2584            \r\n          \u2502        \u2584\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2584          \r\n  206.91  \u2502       \u2584\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2584         \r\n   \u2026to\u2026   \u2502     \u2584\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2584       \r\n  792.16  \u2502  \u2584\u2584\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2584\u2584    <\/pre>\n<p>But here is the distribution of the first characters of the AdventureWorks customers<\/p>\n<pre class=\"\">  2130        \u2502                  \u2584                     \r\n              \u2502                 \u2588\u2588                     \r\n              \u2502                 \u2588\u2588                     \r\n              \u2502  \u2584         \u2588    \u2588\u2588                     \r\n              \u2502 \u2584\u2588   \u2588\u2584    \u2588  \u2584 \u2588\u2588                     \r\n              \u2502 \u2588\u2588   \u2588\u2588   \u2588\u2588  \u2588 \u2588\u2588   \u2588                 \r\n              \u2502\u2588\u2588\u2588   \u2588\u2588   \u2588\u2588  \u2588 \u2588\u2588   \u2588                 \r\n  65.00       \u2502\u2588\u2588\u2588\u2584  \u2588\u2588 \u2588 \u2588\u2588  \u2588 \u2588\u2588\u2588  \u2588                 \r\n     \u2026to\u2026     \u2502\u2588\u2588\u2588\u2588 \u2584\u2588\u2588 \u2588\u2588\u2588\u2588\u2584 \u2588 \u2588\u2588\u2588  \u2588 \u2584\u2588              \r\n  90.00       \u2502\u2588\u2588\u2588\u2588\u2584\u2588\u2588\u2588 \u2588\u2588\u2588\u2588\u2588\u2584\u2588 \u2588\u2588\u2588 \u2588\u2588\u2588\u2588\u2588  <\/pre>\n<p>You can get a very skewed distribution<\/p>\n<pre class=\"\">  15642       \u2502  \u2584                                     \r\n              \u2502 \u2588\u2588\u2588                                    \r\n              \u2502 \u2588\u2588\u2588\u2584                                   \r\n              \u2502 \u2588\u2588\u2588\u2588                                   \r\n              \u2502 \u2588\u2588\u2588\u2588\u2588                                  \r\n              \u2502 \u2588\u2588\u2588\u2588\u2588\u2584                                 \r\n              \u2502\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2584                                \r\n  0.11        \u2502\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2584                               \r\n     \u2026to\u2026     \u2502\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2584                              \r\n  690.66      \u2502\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2584\u2584                           <\/pre>\n<p>\u2026or one that is boringly flat: evenly distributed.<\/p>\n<pre class=\"\">  2613        \u2502 \u2584\u2584 \u2584\u2584\u2584\u2584\u2584\u2584\u2584\u2584 \u2584 \u2584\u2584 \u2584\u2584 \u2584\u2584 \u2584\u2584\u2584 \u2584 \u2584\u2584  \u2584 \u2584\u2584\u2584\u2584\r\n              \u2502\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\r\n              \u2502\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\r\n              \u2502\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\r\n              \u2502\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\r\n              \u2502\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\r\n              \u2502\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\r\n  0.01        \u2502\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\r\n     \u2026to\u2026     \u2502\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\r\n  1199.96     \u2502\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588<\/pre>\n<p>And you can get anything in between!<\/p>\n<pre class=\"\">  2124        \u2502                               \u2584    \u2584\u2584\u2584 \r\n              \u2502                               \u2588\u2584\u2588\u2588\u2584\u2588\u2588\u2588 \r\n              \u2502                           \u2584\u2584\u2584\u2584\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588 \r\n              \u2502                           \u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588 \r\n              \u2502                           \u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588 \r\n              \u2502                           \u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588 \r\n              \u2502                           \u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\r\n  38532.00    \u2502                           \u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\r\n     \u2026to\u2026     \u2502              \u2584   \u2584\u2584\u2584\u2584\u2584\u2584\u2584\u2584\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\r\n  39658.00    \u2502 \u2588\u2584\u2584\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588<\/pre>\n<p>To do all this, we have created a function that does the histogram and now we are simply passing the independent variable to the graphing routine.<\/p>\n<pre class=\"theme:ssms2012-simple-talk lang:tsql decode:true \">DECLARE @Variable IndependentVariable; --a simple table of numbers\r\n  DECLARE @Mean FLOAT; --the mean (and median if normally distributed)\r\n  DECLARE @StandardDeviation FLOAT; --(the Standard deviation you want)\r\n  DECLARE @ii INT; --counter\r\n  DECLARE @TotalWanted INT; --counter\r\n  SELECT @Mean = 500, @StandardDeviation = 100, @ii = 1, @TotalWanted = 100000;\r\n  SET NOCOUNT ON;\r\n  \/* now we generate the numbers we want, with the mean and standard deviation we want *\/\r\n  WHILE @ii &lt;= @TotalWanted\r\n    BEGIN\r\n    INSERT INTO @Variable\r\n      (number)\r\n      SELECT ((RAND() * 2 - 1) + (RAND() * 2 - 1) + (RAND() * 2 - 1))\r\n             * @StandardDeviation + @Mean;\r\n    SELECT @ii = @ii + 1;\r\n    END;\r\n  SELECT ColumnHistogram.line AS 'Normally Distributed Variable '\r\n    FROM dbo.ColumnHistogram(@Variable)\r\n    ORDER BY ColumnHistogram.y DESC;\r\n  GO\r\n  DECLARE @SecondVariable IndependentVariable; --a simple table of numbers\r\n  INSERT INTO @SecondVariable\r\n    (number)\r\n    SELECT ASCII(SUBSTRING(Person.LastName, 1, 1))\r\n    FROM AdventureWorks2012.Person.Person;\r\n  SELECT ColumnHistogram.line AS 'Customers Surnames'\r\n    FROM dbo.ColumnHistogram(@SecondVariable)\r\n    ORDER BY ColumnHistogram.y DESC;\r\n  GO\r\n  DECLARE @ThirdVariable IndependentVariable; --a simple table of numbers\r\n  INSERT INTO @ThirdVariable\r\n    (number)\r\n    SELECT CONVERT(FLOAT, SalesOrderHeader.OrderDate)\r\n    FROM AdventureWorks2012.Sales.SalesOrderHeader;\r\n  SELECT ColumnHistogram.line AS 'No. of orders per date period'\r\n    FROM dbo.ColumnHistogram(@ThirdVariable)\r\n    ORDER BY ColumnHistogram.y DESC;<\/pre>\n<p>The final version is a table-valued function that takes a table-valued parameter as its input. This is a table with a single column consisting of a float. It returns a table that has to be ordered in the right way according to the Y value. Each line represents a row of the histogram in Unicode characters (only three are used for the actual histogram). The histogram looks better if you put the title in the alias for the Line column.<\/p>\n<p>The histogram must be viewed in a monospaced font, and if it is used in SSMS, this is likely to be the case unless you view it in \u2018grid\u2019 mode. You must have \u2018results to text\u2019 selected.<\/p>\n<p>The current version is <a href=\"https:\/\/github.com\/Phil-Factor\/DistributionHistogram\/blob\/master\/FrequencyHistogram.sql\">here in Github<\/a>. A version is attached to the article but we don\u2019t always succeed in keeping that up-to-date.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There are many reasons for wanting to know how data is distributed. Sometimes you just want a rough idea of the way that data is distributed in a column. You may think, wouldn&#8217;t it be nice to have a SQL function that just showed you roughly what the distribution was, graphically, in the results pane.  Phil Factor thought that was well and turned the vague wish into reality.&hellip;<\/p>\n","protected":false},"author":154613,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2,143531],"tags":[],"coauthors":[6813],"class_list":["post-70929","post","type-post","status-publish","format-standard","hentry","category-blogs","category-t-sql-programming-sql-server"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/70929","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/154613"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=70929"}],"version-history":[{"count":4,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/70929\/revisions"}],"predecessor-version":[{"id":70936,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/70929\/revisions\/70936"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=70929"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=70929"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=70929"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=70929"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}