{"id":3542,"date":"2012-05-03T15:23:00","date_gmt":"2012-05-03T15:23:00","guid":{"rendered":"https:\/\/test.simple-talk.com\/uncategorized\/metrics-a-little-knowledge-can-be-a-dangerous-thing-or-why-youre-not-clever-enough-to-interpret-metrics-data\/"},"modified":"2016-07-28T10:50:47","modified_gmt":"2016-07-28T10:50:47","slug":"metrics-a-little-knowledge-can-be-a-dangerous-thing-or-why-youre-not-clever-enough-to-interpret-metrics-data","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/blogs\/metrics-a-little-knowledge-can-be-a-dangerous-thing-or-why-youre-not-clever-enough-to-interpret-metrics-data\/","title":{"rendered":"Metrics &#8211; A little knowledge can be a dangerous thing (or &#8216;Why you&#8217;re not clever enough to interpret metrics data&#8217;)"},"content":{"rendered":"<p>At RedGate Software, I work on a .NET obfuscator  called <a href=\"http:\/\/www.red-gate.com\/products\/dotnet-development\/smartassembly\/\">SmartAssembly<\/a>.  Various features of it use a database to store various things (exception reports, name-mappings, etc.) The user is given the option of using either a SQL-Server database (which requires them to have Microsoft SQL Server), or a Microsoft Access MDB file (which requires nothing). MDB is the default option, but power-users soon switch to using a SQL Server database because it offers better performance and data-sharing.<\/p>\n<p>In the fashionable spirit of optimization and metrics, an obvious product-management question is &#8216;Which is the most popular? SQL Server or MDB?&#8217;<\/p>\n<p>We&#8217;ve collected data about this fact, using our &#8216;Feature-Usage-Reporting&#8217; technology (available as part of <a href=\"http:\/\/www.red-gate.com\/products\/dotnet-development\/smartassembly\/\">SmartAssembly<\/a>) and more recently our &#8216;Application Metrics&#8217; technology:<\/p>\n<table>\n<tbody>\n<tr>\n<td valign=\"top\">\n<p>Parameter<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Number of users<\/p>\n<\/td>\n<td valign=\"top\">\n<p>% of total users<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Number of sessions<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Number of usages<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>SQL Server<\/p>\n<\/td>\n<td valign=\"top\">\n<p>28<\/p>\n<\/td>\n<td valign=\"top\">\n<p>19.0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>8115<\/p>\n<\/td>\n<td valign=\"top\">\n<p>8115<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>MDB<\/p>\n<\/td>\n<td valign=\"top\">\n<p>114<\/p>\n<\/td>\n<td valign=\"top\">\n<p>77.6<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1449<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1449<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>(A<em>s a disclaimer, please note than SmartAssembly has far more than 132 users . This data is just a selection of one build)<\/em><\/p>\n<p>So, it would appear that SQL-Server is used by fewer users, but more often. Great.<\/p>\n<p>But here&#8217;s why these numbers are useless to me:<\/p>\n<h4>Only the original developers understand the data<\/h4>\n<p>What does a single &#8216;usage&#8217; of &#8216;MDB&#8217; mean? Does this happen once per run? Once per option change? On clicking the &#8216;Obfuscate Now&#8217; button? When running the command-line version or just from the UI version? Each question could skew the data 10-fold either way, and the answers only known by the developer that instrumented the application in the first place. In other words, only the original developer can interpret the data &#8211; product-managers cannot interpret the data unaided.<\/p>\n<h4>Most of the data is from uninterested users<\/h4>\n<p>About half of people who download and run a free-trial from the internet quit it almost immediately. Only a small fraction use it sufficiently to make informed choices. Since the MDB option is the default one, we don&#8217;t know how many of those 114 were people CHOOSING to use the MDB, or how many were JUST HAPPENING to use this MDB default for their 20-second trial.<\/p>\n<p>This is a problem we see across all our metrics: Are people are using X because it&#8217;s the default or are they using X because they <em>want<\/em> to use X? We need to segment the data further &#8211; asking what percentage of each percentage meet our criteria for an &#8216;established user&#8217; or &#8216;informed user&#8217;. You end up spending hours writing sophisticated and dubious SQL queries to segment the data further. Not fun.<\/p>\n<h4><strong>You can&#8217;t find out<em> why<\/em><\/strong> they used this feature<\/h4>\n<p>Metrics can answer the when and what, but not the why. Why did people use feature X? If you&#8217;re anything like me, you often click on random buttons in unfamiliar applications just to explore the feature-set. If we listened uncritically to metrics at RedGate, we would eliminate the most-important and more-complex features which people actually buy the software for, leaving just big buttons on the main page and the About-Box.<\/p>\n<h4>&#8220;Ah, that&#8217;s interesting!&#8221; rather than &#8220;Ah, that&#8217;s actionable!&#8221;<\/h4>\n<p>People do love data. Did you know you eat 1201 chickens in a lifetime? But just 4 cows? Interesting, but useless. Often metrics give you a nice number: &#8216;5.8% of users have 3 or more monitors&#8217; . But unless the statistic is both SUPRISING and ACTIONABLE, it&#8217;s useless.<\/p>\n<p>Most metrics are collected, reviewed with lots of cooing. and then forgotten. Unless a piece-of-data could change things, it&#8217;s useless collecting it.<\/p>\n<h4>People get obsessed with significance levels<\/h4>\n<p>The first things that lots of people do with this data is do a t-test to get a significance level (<em>&#8220;Hey! We know with 99.64% confidence that people prefer SQL Server to MDBs!&#8221;<\/em>) Believe me: other causes of error\/misinterpretation in your data are FAR more significant than your t-test could ever comprehend.<\/p>\n<h4>Confirmation bias prevents objectivity<\/h4>\n<p>If the data appears to match our instinct, we feel satisfied and move on. If it doesn&#8217;t, we suspect the data and dig deeper, plummeting down a rabbit-hole of segmentation and filtering until we give-up and move-on. Data is only useful if it can change our preconceptions. Do you trust this dodgy data more than your own understanding, knowledge and intelligence?  I don&#8217;t.<\/p>\n<h4>There&#8217;s always multiple plausible ways to interpret\/action any data<\/h4>\n<p>Let&#8217;s say we segment the above data, and get this data:<\/p>\n<h6>Post-trial users (i.e. those using a paid version after the 14-day free-trial is over): <\/h6>\n<table>\n<tbody>\n<tr>\n<td valign=\"top\">\n<p>Parameter<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Number of users<\/p>\n<\/td>\n<td valign=\"top\">\n<p>% of total users<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Number of sessions<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Number of usages<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>SQL Server<\/p>\n<\/td>\n<td valign=\"top\">\n<p>13<\/p>\n<\/td>\n<td valign=\"top\">\n<p>9.0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1115<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1115<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>MDB<\/p>\n<\/td>\n<td valign=\"top\">\n<p>5<\/p>\n<\/td>\n<td valign=\"top\">\n<p>4.2<\/p>\n<\/td>\n<td valign=\"top\">\n<p>449<\/p>\n<\/td>\n<td valign=\"top\">\n<p>449<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h6>Trial users:<\/h6>\n<table>\n<tbody>\n<tr>\n<td valign=\"top\">\n<p>Parameter<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Number of users<\/p>\n<\/td>\n<td valign=\"top\">\n<p>% of total users<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Number of sessions<\/p>\n<\/td>\n<td valign=\"top\">\n<p>Number of usages<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>SQL Server<\/p>\n<\/td>\n<td valign=\"top\">\n<p>15<\/p>\n<\/td>\n<td valign=\"top\">\n<p>10.0<\/p>\n<\/td>\n<td valign=\"top\">\n<p>7000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>7000<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td valign=\"top\">\n<p>MDB<\/p>\n<\/td>\n<td valign=\"top\">\n<p>114<\/p>\n<\/td>\n<td valign=\"top\">\n<p>77.6<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1000<\/p>\n<\/td>\n<td valign=\"top\">\n<p>1000<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>How do you interpret this data? It&#8217;s one of:<\/p>\n<ol>\n<li>Mostly SQL Server users buy our software. People who can&#8217;t afford SQL Server tend to be unable to afford or unwilling to buy our software. Therefore, ditch MDB-support.<\/li>\n<li>Our MDB support is so poor and buggy that our massive MDB user-base doesn&#8217;t buy it.  Therefore, spend loads of money improving it, and think about ditching SQL-Server support.<\/li>\n<li>People &#8216;graduate&#8217; naturally from MDB to SQL Server as they use the software more. Things are fine the way they are.<\/li>\n<li>We&#8217;re marketing the tool wrong. The large number of MDB users represent uninformed downloaders. Tell marketing to aggressively target SQL Server users.<\/li>\n<\/ol>\n<p>To choose an interpretation you need to segment again. And again. And again, and again.<\/p>\n<h4>Opting-out is correlated with feature-usage<\/h4>\n<p>Metrics tends to be opt-in. This skews the data even further. Between 5% and 30% of people choose to opt-in to metrics (often called &#8216;customer improvement program&#8217; or something like that). Casual trial-users who are uninterested in your product or company are less likely to opt-in. This group is probably also likely to be MDB users. How much does this skew your data by? Who knows?<\/p>\n<h4>It&#8217;s not all doom and gloom.<\/h4>\n<p>There are some things metrics can answer well.<\/p>\n<ol>\n<li>Environment facts. How many people have 3 monitors? Have Windows 7? Have .NET 4 installed? Have Japanese Windows?<\/li>\n<li>Minor optimizations.  Is the text-box big enough for average user-input? <\/li>\n<li>Performance data. How long does our app take to start? How many databases does the average user have on their server?<\/li>\n<\/ol>\n<p>As you can see, questions about who-the-user-is rather than what-the-user-does are easier to answer and action.<\/p>\n<h4>Conclusion<\/h4>\n<ol>\n<li>Use <a href=\"http:\/\/www.red-gate.com\/products\/dotnet-development\/smartassembly\/\">SmartAssembly<\/a>. If not for the metrics (called &#8216;Feature-Usage-Reporting&#8217;), then at least for the obfuscation\/error-reporting.<\/li>\n<li>Data raises more questions than it answers.<\/li>\n<li>Questions about environment are the easiest to answer.<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>At RedGate Software, I work on a .NET obfuscator called SmartAssembly. Various features of it use a database to store various things (exception reports, name-mappings, etc.) The user is given the option of using either a SQL-Server database (which requires them to have Microsoft SQL Server), or a Microsoft Access MDB file (which requires nothing)&#8230;.&hellip;<\/p>\n","protected":false},"author":95472,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[2],"tags":[],"coauthors":[],"class_list":["post-3542","post","type-post","status-publish","format-standard","hentry","category-blogs"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/3542","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/95472"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=3542"}],"version-history":[{"count":2,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/3542\/revisions"}],"predecessor-version":[{"id":42155,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/3542\/revisions\/42155"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=3542"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=3542"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=3542"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=3542"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}