Database Geek of the Week – Michael Rys

Dr. Michael Rys is the program manager for the SQL Server Engine Team at Microsoft, represents Microsoft on the W3C XQuery Working Group and has a seat on the SQL Standardization committee.

“I think one of the advantages and, to some extent, disadvantages of XML is that it can be used for many different things: as a transport format for data interchange, as a markup document format (for example, XHTML, WordML etc.), as a semi-structured data representation, and so on. The important thing to remember is that all this is data and therefore needs to be integrated with the data management platform – in our case, SQL Server”

Dr. Michael Rys is the program manager for the SQL Server Engine Team at Microsoft. Michael also represents Microsoft on the W3C XQuery Working Group and has a seat on the SQL Standardization committee. He is also a co-author of XQuery from the Experts.

The following questions were asked by me and answered by Michael via email.

Doug: I notice you have a doctorate. Could I ask what your degree is in?

Michael: Did you have to start with this question? 😉

My doctorate is in computer science on “Materialization and Parallelism in the Mapping an Object Model to a Relational Multi-Processor System” at ETH Zurich, Switzerland.

Doug: What impact has your academic background, as opposed to your practical experiences, had on your database work?

Michael:

It gave me a solid foundation to put the real world issues that users experience into a framework that provides an efficient and general solution to the users’ problems. A non-exhaustive list of the skills that my academic background provided me is as follows:

  1. A fundamental understanding of the technology; what is possible and what may become possible in the future
  2. A sound grasp of the analytical processes and tools required to understand the big picture and perform the necessary benefit/cost analysis
  3. The ability to communicate with the research community, help to transform their ideas into solutions, and tell them what matters to users.

It also allows interviewers to ask me about my academic background 😉

Doug: When you program in .NET, which is your favored .NET language?

Michael: As a program manager, I rarely get to write more than demo code. Obviously, I am a huge fan of SQL and XQuery, but also like Visual C# and the upcoming Visual Basic 9.0 with Project LINQ integration…

Doug: Microsoft has encouraged their employees to take part in blogging and you have a reasonably active blog. What have you found to be the best part of blogging? The worst?

Michael: I like the fact that my customers get to interact with me directly, “unfiltered” (and vice versa). Also that I can very quickly disseminate information and that I can enter into a dialog with people like Jon Udell. The major downside used to be the comment spam, but now it is the fact that I currently don’t have enough time to write much…

Doug: The Microsoft hiring process is legendary. Can you tell us a little about how you managed to start working at Microsoft?

Michael: I got asked twice to send in my resume by colleagues who started working at Microsoft in the early 1990-ies, but I wanted to finish my academic studies first (what a financial mistake). When I finally finished my post-doctoral studies at Stanford University in 1998, Adam Bosworth was running the XML team at Microsoft and was looking for help in integrating XML and databases. He felt that my database background and understanding of semi-structured data would be useful so he invited me for an interview with his team. And the rest is history…

Doug: Where in the SQL Server world do you think XML best fits?

Michael: I think one of the advantages and, to some extent, disadvantages of XML is that it can be used for many different things: as a transport format for data interchange, as a markup document format (for example, XHTML, WordML etc.), as a semi-structured data representation, and so on. The important thing to remember is that all this is data and therefore needs to be integrated with the data management platform – in our case, SQL Server. So XML fits as a transport format for relational data (which inside the database should be still be processed as relational data), but it also enables easier management of semi-structured data inside the database, and allows us to unlock the information inside documents beyond simple full-text search – for example, enabling context sensitive search.

Doug: One thing that I have noticed when working with SQL Server betas (including the betas for 2005) is that, generally, by Beta 2 the software is amazingly solid. This is noticeably true compared to any other Beta 2 from Microsoft or any other company I have done beta testing for. Can you comment on the beta process in the SQL Server group and how it does what it does?

Michael: Hmm, I don’t want to give our secrets away, but I think I can say that it is a combination of a) our exceptional developers, testers and program managers, b) the clear focus on quality by everyone in the team, and c) processes that allow us to test and fix bugs and make the right decision with regard to quality. Everyone in our organization understands that you cannot reboot a database to get rid of a memory leak once you are running in a 24/7 production environment. Last, but not least, it is also the dedication of our early adopters out there who are willing to run their databases on pre-beta software. Thanks to all of you!

Doug: You were a co-author on a multi-author book. How did you find working on a book with that many authors?

Michael: Since each of us wrote a chapter, and I needed to coordinate only with one of my co-authors to make sure that we were complementary and included references to each other, it was actually much easier than writing the whole book. The downside is that the revenue stream from the book is very small .

Doug: Have you read any good database related books lately? Any good general software development books?

Michael: There is a new edition out of “Database System Concepts” by Korth, Silberschatz and Sudarshan. And there are some good SQL Server 2005 books coming out (including – hopefully – one in which I write about the XML support in SQL Server 2005). A good general book is “Writing Secure Code” by Michael Howard. But recently I have been reading more management- and planning-related books.

Doug: What do you think about using VB.NET or C# for stored procedures, user defined functions and triggers? What sort of guidance can you give to people looking to properly leverage the ability to use procedural code in SQL Server 2005?

Michael: I think the most important guidance I can give is that you should clearly understand the trade-offs between running the code outside of the database tier (thus moving load off the database server), and running it inside the database (thus taking load off of the network). Once you’ve decided to run code inside the database tier, it’s important to make the correct choice between using T-SQL and using SQL CLR. If you are doing lots of data retrieval activity I would strongly recommend using TSQL. However, if you are doing CPU intensive operations on special data types or creating a replacement for extended stored procedures, then I would highly recommend using the SQL CLR.

Doug: Can you think of any particularly cool tip or trick, especially in SQL Server 2005, that database developers may not know about?

Michael: There are a couple of interesting tips such as the use of SQL Profiler to see lots of interesting information about your queries, using query plans to see how your query is being executed, “forcing” the use of the old query plan using plan guides, using SQL Service Broker to build asynchronous applications, and so on. However, since I am the program manager for XQuery, I would like to share one tip that is XQuery specific. If you use an exist() method in the SQL predicate, have at least a primary XML index specified, and the XQuery expression uses a predicate of the form /a/b[@c=5], then you should rewrite to be of the form /a/b/@c[.=5]. While the latter expression gives different results, since the first returns b elements and the second returns c attributes, the existential quantification of the exist() method will make them equivalent, and the second form will perform more efficiently.

Do you know someone who deserves to be a Database Geek of the Week? Or perhaps that someone is you? Send me an email at editor@simple-talk.com and include “Database Geek of the Week suggestion” in the subject line.