Business Intelligence in SQL Server 2005, Part 1

From ad-hoc queries through managed reports to report models, Dejan Sarka evaluates SQL Server 2005 reporting solutions.

An Overview of SQL Server 2005 Reporting Solutions

The primary focus of today’s IT systems is deriving information from collections of data, and several technologies exist for this purpose. With SQL Server 2005, we get all the modern analytical technologies, from basic reporting, through OLAP systems, to data mining applications. In this article and the next, I will present all available possibilities in SQL Server 2005 and describe the strengths and weaknesses of each. I aim to demonstrate that, for a complete business intelligence solution, each has its place and that it’s only by the appropriate use of each one that we get the structured range of reports we need. Here, I focus on reporting solutions – from ad-hoc queries through managed reports to report models.

Both articles use the sample AdventureWorks and AdventureWorksDW databases. Adventure Works Cycles is a fictitious company that manufactures and sells metal and composite bicycles to North American, European and Asian commercial markets. They also sell other sports equipment and accessories. AdventureWorks is an OLTP database and AdventureWorksDW is the data warehouse database of the company.

In order to increase their sales, the company is preparing a mailing campaign. The response rate for such campaigns is notoriously poor – typically, only two percent. If the company could achieve a four percent response, they could halve the cost of the campaign while still achieving the same number of responses, but only printing half as many leaflets. They decide to find out what kind of people tend to buy bikes, so that they can perform a more targeted mailing campaign. Having obtained the data about prospective buyers, they will need to filter it. As a starting point in their learning process, they create a view joining the demographic information of their existing customers with a BikeBuyer column serving as a flag to indicate which customers have purchased a bike in the past. The name of the view is vTargetMail; you can find it in the AdventureWorksDW database.

Ad-hoc Transact-SQL queries

It’s possible for a developer to gain quite a bit of business intelligence just from using SQL Server Management Studio (SSMS) to write ad hoc queries. For example, consider the following simple query to the vTargetMail view:

CustomerKey

BikeBuyer

NumberCarsOwned

MaritalStatus

21972

0

2

M

18578

1

0

S

21303

0

2

M

15453

1

0

M

23014

0

2

M

Table 1: Data from vTargetMail

This result set does not tell me much. I suspect that the number of cars owned has some influence on bike ownership and I can verify this by calculating the percentage of bike buyers in the total population, broken down according to number of cars owned:

NumberCarsOwned

PctBuyers

0

63.40

1

55.21

2

40.10

3

42.19

4

36.95

Table 2: Percentage of bike buyers, broken down by number of cars owned

It is obvious that a lower numbers of cars owned leads to a higher probability of buying a bike. This is a significant result. However, I can use many other attributes to analyze my bike buyers. In addition, I want to try a breakdown over two input variables, i.e. I want to create a pivot table from my basic data. Pivot tables are easy to create in Excel but were quite difficult to create in SQL until the introduction of the PIVOT operator in SQL Server 2005. The following query displays the percentage of bike buyers in different subgroups of customers. The input variables, “number of cars owned” and “marital status”, define the subgroups (for example, one subgroup is “married customers with a single car”).

NumberCarsOwned

Married

Single

0

60.52

67.94

1

53.02

57.07

2

37.95

42.46

3

33.13

56.09

4

35.24

40.00

Table 3: Pivoting the percentage of bike buyers

This shows that single people with three cars are also a very interesting group.

For more on Transact-SQL queries for SQL Server 2005, read “Inside Microsoft® SQL Server⢠2005: T-SQL Querying” by Itzik Ben-Gan, Lubor Kollar, and Dejan Sarka, ISBN 0-7356-2313-9.

The advantage of ad-hoc querying is that I can do whatever I want – I could continue researching with many different queries. However, this method has many drawbacks. You cannot give SQL Server Management Studio to all of your end-users. You cannot expect them to learn the Transact-SQL language. They will want to see the results in a nicer format – the user interface is very important. You could use Excel® to format the results, but then you would have to maintain a separate Excel file for each end-user. Managing separate files for each user is not very practical and this is a big shortcoming of this approach. You should manage your reports centrally and use some automatic delivery method for them.

Also, do not forget that these analyses typically involve a large amount of data and by executing ad-hoc queries on a production system; you are placing an additional burden on that system. You can mitigate this problem by using snapshot isolation, database snapshots, or a replica of the production database. However, there is always some impact on the performance of SQL Server, no matter which technology you use, and you should be very careful when you are playing with a production database.

For details about snapshot isolation, database snapshots and replication, please refer to the MSDN library. For example:

Managed reports

The logical next step is to introduce managed reports. SQL Server Reporting Services (SSRS) came as a free add-in for SQL Server 2000 Standard or Enterprise Editions. In SQL Server 2005, SSRS is enhanced and fully integrated.

The complete life cycle of a managed report consists of authoring, management and delivery. You can author reports with Business Intelligence Development Studio (BIDS), using two templates in the BI Projects folder: Report Server Project Wizard and Report Server Project. With the latter, you go directly into a report design environment called Report Designer. Via a couple of screens of wizard-driven questions, you can quickly create a report and then return to Report Designer, where you can customize it. When the report is finished, you deploy it on a selected report server. After the deployment, the report is ready for the delivery.

For management of the deployed reports you can use SSMS or Report Manager, an ASP.NET application shipped with SSRS. The default delivery mechanism is to publish via a reporting portal, again shipped with SSRS, and accessible with Report Manager or through URL links from any application. This is a pull delivery method (the end-user has to pull the report); if you deploy the report from Visual Studio, it is also automatically published on the report portal. The default rendering is HTML but there are many additional rendering and delivery mechanisms available.

In the continuing search to discover what drives customers to decide to buy a bike, my next step is authoring a report. Starting with the wizard, the first thing I have to define is my data source, in this case my local SQL Server AdventureWorksDW database. Then I have to write the T-SQL query that will deliver the report data. I want to replicate the result of the earlier PIVOT query but I just need to collect the base data as pivoting is done via the Matrix data region control:

Next, I come to the Report Designer environment. The design window has three tabs: Data, Layout and Preview. In the Data tab, I can see my source query. I can change the layout using the Layout tab. I have to correct the layout, because the default aggregate function for the Details area of the Matrix data region is Sum, which gives an incorrect result for percentages. I replace it with the Avg function, as shown in Screenshot 1.

257-Sarka001.gif

Screenshot 1: Correct aggregate function

After previewing the report and adding minor enhancements, it is ready for deployment. You can deploy directly from BIDS (which is actually Visual Studio, as can be seen from Screenshot 1). After the deployment, analysts and managers can use Report Manager to view the report, as shown in Screenshot 2.

257-Sarka002.gif

Screenshot 2: View the pivot report in Report Manager

That was a major step forward. The authored report is already something that you can deliver to your end-users. Managed reports are extremely easy to use – delivery can be via email, for example. All the end-user has to do is to look at the report, which is why managed reports are so popular.

Nonetheless, they have many disadvantages. In the first place, somebody has to create the report. If my end-users wanted to replace the Marital status column with a Gender column, I would have to author another report, starting from scratch with a new source query. This is the main disadvantage of reports. Developers often get annoyed by having, over and over again, to create new reports that are almost the same as existing ones. Analysts and managers, in turn, are not satisfied with the fact they have to wait for hours, or even days, for every little change.

The problem can be somewhat mitigated by the use of parameterized reports; still, the basic structure is defined with the source query. Trying to teach end-users to use BIDS and write T-SQL queries so that they can create their own reports is a hopeless task. In addition, the reports still use the production database, so all of the warnings about ad-hoc queries still apply. SSRS can hold one or more snapshots of the data. These snapshots can lower the stress on the production systems, but they also might introduce another problem – the reports from the snapshots no longer show the latest data.

Report models

In order to successfully author reports, you have to learn more than just the BIDS environment and T-SQL language. Perhaps an even bigger problem is locating the data you want to measure, which requires you to learn the logical and physical schema of the database that supports your business application(s). In this example, we are using demo database with a nice pre-prepared view. In production databases, such views may not be available and, as a DBA or developer, you have to create one. The problem is that the database schema for an OLTP system can be quite complicated, including hundreds, if not thousands, of tables.

Once you understand the schema, you can prepare the necessary view. This view is a new layer of metadata between you and the actual tables, which hides the complexity of the data model and enables you to write simple queries. This is a big improvement, but is still not good enough to enable end-users to author their own reports. End-users do not want to write queries at all. Besides that, they need richer metadata. They need the data described in terms of entities and attributes, using business terms that they understand. In addition, they do not want to spend too much time on the layout of a report or to work with a complex tool like BIDS Report Designer.

You can solve these problems with report models. A report model is a semantic layer of metadata stored in the SSRS database, describing the data from a business point of view, adding many additional properties that enable the quick and easy authoring of reports. Of course, a report model is not something an end-user would prepare.

I can create a report model in BIDS using the Report Model Project template. I have to start with a data source (my local AdventureWorksDW database). In the Solution Explorer window of the BIDS, you will see an additional folder called Data Source Views. A data source view is metadata from the source database, transferred to the development machine, and stored in an XML format. They enable disconnected development. Authoring a report model can take a long time, and it would not be acceptable to work in a connected environment, connected to the production server. A data source view is a development-time object only; it is not a part of the deployment. In my data source view, I am including only the vTargetMail view from the AdventureWorksDW database. When the data source and the data source view are prepared, I can start the Report Model wizard by right-clicking on the Models folder in the Solution Explorer window. The wizard can create many metadata elements automatically, including entities from tables, attributes from columns, roles for associations between tables (foreign keys), date attributes variations (year, quarter, month, and so on) from datetime columns, different aggregations for numeric columns and much more, as shown in Screenshot 3.

257-Sarka003.gif

Screenshot 3: Report Model wizard generation rules

I accept all the wizard defaults and, after the wizard finishes its work, I come to the Semantic Model Designer window. Here I can add additional properties, such as conceptual names and format strings. I changed the expression for the Avg Bike Buyer attribute created by the wizard from AVERAGE(Bike Buyer) to AVERAGE(Bike Buyer) * 100, in order to get percentages instead of proportions. You can see the Semantic Model Designer with expanded date element variations and numeric aggregates in Screenshot 4.

257-Sarka004.gif

Screenshot 4: Semantic Model Designer

When the model is completed, you can deploy it to your report server just as you would deploy a report, and now your analysts can start authoring their own reports, using the Report Builder. If a user has permissions to use the Report Builder, they can access the tool from Report Manager. With Report Builder, authoring a report is a reasonably simple and straightforward process.

Report Builder

When you start Report Builder, you can select the layout for the data region (table, matrix or chart), and the model you are going to use. In my case, I selected the matrix data region, because I wanted to create the same pivot report as I did in previous examples. I dragged the Number Cars Owned attribute to the row area, Marital Status to the Column area and Avg Bike Buyer to the Details area. After a bit of polishing, such as enlarging the font and adding a report title, the report is finished. In Screenshot 5, you can see the design window of the Report Builder.

257-Sarka005.gif

Screenshot 5: Designing a report with Report Builder

Deployment from the Report Builder is simply a matter of clicking on the Save button, selecting the deployment folder and naming the report. After deployment, the report is available to any user, just as if the report were created using Visual Studio. You can use the Report Manager to view it.

Report models enable end-users to create their own ad-hoc reports. This is a great advantage over reports created with Report Designer or Report Wizard tools. However, report models do not solve all problems. Only reasonably “advanced” end-users will be able to use the Report Builder. Once again, of course, the reports use the production database, so all the warnings about ad hoc queries also apply here.

Conclusion

We are now halfway through looking at different ways of analyzing data using SQL Server 2005 tools. In next article, I am going to show an OLAP solution, create a real-time ROLAP variation of the solution, and finally use Data Mining to answer definitively the question: what drives customers to buy bikes?