Like all Red Gate products, SQL Source Control uses SmartAssembly to report back any problems that cause exceptions. In addition to doing this, there is feature usage reporting which gives us the sort of data that web developers are more used to without any extra work from us.
When we’re collecting data, we only collect anonymous data, and then only from the fifty per cent of users who actively opt in. There’s a single opt in dialogue shared across all of our tools, so the user doesn’t have to be bothered again and again.
When users start SSMS (SQL Server Management System) they don’t want to be interrupted by an opt-in dialog for SQL Prompt, then SQL Source Control, followed by SQL, so they are only asked once. There’s a page on the SmartAssembly website that has a list of all the things that SmartAssembly is collecting and also says that the tools may collect a few additional items of anonymous data.
The usage data comes back in the same way as error reporting; it’s well established at this stage, and works through most firewall setups.
Out of the box, Feature Usage Reporting gave us hard data on the user’s system, both hardware and software. For no work, we can see number of CPU cores so we know that we should be doing making use of parallelism, because we know that we should make the best usage of the available cores. It can tell us what version of .NET is being used.
Once we’d seen how useful the general data about the user-environment was, we started to ask more application-specific questions by marking up our code with the attributes and method calls to send back more information about the user’s environment; such as server versions, the SQL Server versions that they connect to, SSMS versions, and other data about software configuration. Although this is helpful to us, we haven’t yet made any big decisions off that yet.
Framing a specific question about usage: an example.
What we found most useful about SmartAssembly was that it allowed us to send back something that answered a specific question we had about usage from the running application. Feature usage reporting can be programmed to send back anything that we ask it to just by making one of the method calls. These have to be planned in advance; so, as part of every code review, we got in the habit of thinking ‘can this be improved with feature usage reporting?’
In our case, we had a concrete question, ‘How often does the default trace rollover?’ We use the default trace in Source Control to inform us who had made a change to the database. A trace is collected in SQL Server by default in order to provide an audit trail for a number of events, including some DDL changes. The system keeps five files of data and so, at any given time one is being written-to and the others are historical. It rolls over every so often and we wondered how far back this historical information would go back on a busy development server. Before we introduced this, we asked our DevOps team how often their SQL Server traces rollover and they had told us it was in the region of a few months.
We knew from customers that this wasn’t the case for them; that it wasn’t a few months because these were development machines, with a large number of DDL changes taking place, but we felt that our simple approach was probably the best compromise. When we first tried feature usage reporting We decided to check first whether our assumption that we could just refine the approach that we already had was correct, and we could process one trace file in order to find out which associated user made the change. Although we were aware that it would roll-over more frequently whenever there were a lot of Development changes, we were surprised what we discovered. We found out through feature usage reporting that most users only get ten minutes or so of this trace, so we know that if a change is made longer than 10 minutes ago we won’t be able to tell which user made the change.
This rollover happens much more frequently than we thought, and our own process was adding to the problem by making auditable events as well. Previously, we’d only used one of these trace logs for performance reasons, so the obvious way to fix the problem seemed to be to use the other four as well to get a reasonable history. Feature usage reporting won’t fix the problem because it’s much more severe than we thought, and so we needed to think of another way to gain a history of DDL changes.
The solution is to keep a more permanent record ourselves of all the DDL events and use that. It gives us a little bit more work to do, but it’s going to take away all these extra DDL trace records that are appearing in the default trace. Because we have information from users, we know that we have to use this redesigned method since the obvious solution wouldn’t work. This saved us development time, testing time and then the time shipping it to users. If we’d have done the work that implemented the obvious solution, it might have been four weeks before we found out that, actually, it hasn’t fixed the problem for most people. The great thing about it is that we can detect that there will be a problem and fix it before the problem actually appears on customer sites.
It is difficult to think of an effective alternative to doing this. There’s no way that you would ever work this out without using Feature Usage Reporting to send back this data. The users would probably tell us that they didn’t have a user associated with a DDL change to a database, but to have them diagnose it would have been quite difficult. So the best time to work out why it is going wrong is when that code is executing. This is data that neither the user nor our support team will have. Instead of a sample of maybe 10 users (the most that one can realistically expect do in-depth debugging with) we get a good proportion of the users to send this information back to us.
Getting the most from Early Access programs
Feature usage reporting is great for getting real data from the field if you have a specific question. This works best if you’re doing Scrum or some sort of Agile process where you release frequently to a large Early Access (EA) group as Source Control does, because then you can think a few sprints in advance. If instead you perform releases by a waterfall process, you might have to wait months between asking the question and getting it answered, and you can’t really make any decisions on that. To do this sprint we need this question answered, so if we ask it now, by the time we come to do the work, we will know the answer. We can go down for an implementation detail from maybe twenty different alternative avenues and then we will be able to decide which decision is the best route.
From an engineering point of view, the Early Access program provides us with the most opportunities for driving development from the feedback. We can think of technical questions that we need answers to and then find out the answers from the usage of the application.
A lot of information can actually help us once the software goes into production. As an example of this, I was looking through our features this morning and there’s one where, in Source Control, we have a commit comment. When you commit to the Source Control system, you can associate a comment with it and this comment can be arbitrarily long or any number of lines. We want to get the size of the text-box just about right so it will accommodate most comments without scrolling. We can now see that the average user has three lines of comments, so you know that the text box only needs to be three lines in height for the average user.
Tracking usage and workflow.
Initially we assumed that SmartAssembly’s feature usage reporting would be able to tell us what our users were doing with the product so we could improve it. You can certainly see as you issue EAs whether people are finding the new feature. We have created a filtering feature where you can filter out certain object types, and we can deduce quite a lot of interest about general usage from this. However, the most important things we need to know are the points where people drop out during a workflow; but this is hard to find out with feature usage reporting at present because it doesn’t do a tracked workflow.
If your development team are doing rapid releases and have a well-used early access program, then SmartAssembly can provide the information to help make better decisions on how to improve the application. Although the built-in reporting from SmartAssembly is valuable, the more targeted metrics that you can create using SmartAssembly Method calls are even more effective. With the right metrics in place, we can detect that there will be a problem and fix it before the problem actually appears on customer sites.
Without this system, the Early Access program would be a lot more burdensome for the user. With the right feedback going back to the developers, users can get on with using the application without having to send screen dumps or crash reports to the developers or make support calls. With the more advanced metrics, we can go well beyond what the user is consciously aware of. There’s no way that you would ever work this out without sending back this data. If you require a long-suffering user to run a diagnostic utility to get this type of information for you, you can’t expect a sample of more than 10 users to do in-depth debugging with you. With SmartAssembly, we can get a majority of our Early Access users to send this information back to us without any effort on their part.