The database-as-API approach
By Adam Machanic
This article is adapted from Adam’s forthcoming book, Expert SQL Server 2005, to be published by Apress later in 2006.
A seemingly never-ending battle in online database forums involves the question of whether or not database application development should involve the use of stored procedures. A popular article on the Simple-Talk Web site (To SP or Not to SP in SQL Server) discussed various pros and cons of the stored procedure vs. ad hoc SQL debate, including security, performance, and maintainability. These are definitely interesting questions to consider when deciding which side of the fence you stand on, but I feel that a stronger argument can be made by revisiting some of the basic tenets of software architecture in order to determine how stored procedures might play a part in a properly designed system.
In my opinion, all external database access should occur through stored procedures. In this article, I review a few of the factors that inform this belief and discuss a way of thinking about the database that is aligned with sound software architectural practices. To begin with, a bit of background understanding is important, so I will introduce (or re-introduce) a few software architecture ideals that should be considered whenever designing any software -database, or otherwise.
Coupling, cohesion, and encapsulation
As a database developer, it can be tempting to place yourself in a “data bubble” and ignore the wide world of applications and users around you. The database may exist simply to store and retrieve data, but in order to do so it must interact with those who need the data therein. There are three concepts that I believe every database-oriented software developer must understand in order to build correctly designed databases that can interoperate well:
- Coupling refers to the degree of dependency of one module in a system upon another module in the system. It can also refer to the degree of dependency that exists between systems. Modules, or systems, are said to be tightly coupled when they depend on each other to such an extent that a change in one necessitates a change to the other. Software developers should strive instead to produce the opposite: loosely coupled modules and systems.
- Cohesion refers to the degree that a particular module or subsystem provides a single functionality to the application as a whole. Strongly cohesive modules, which have only one function, are said to be more desirable than weakly cohesive modules that do many operations and therefore may be less maintainable and reusable.
- Encapsulation refers to how well the underlying implementation is hidden by a module in a system. This concept is essentially the juxtaposition of loose coupling and strong cohesion. Logic is said to be encapsulated within a module if the module’s methods or properties do not expose design decisions about its internal behaviors. A properly designed module should control its own data and rules internally-and not have to rely on any consumer to properly do so. The idea here is to implement the logic exactly once and reuse it as many times as necessary, instead of implementing the logic wherever it needs to be used.
Unfortunately, these definitions are somewhat ambiguous, and even in real systems there is a definite degree of subjectivity that goes into determining whether a given module is or is not tightly coupled to some other module, whether a routine is cohesive, or whether logic is properly encapsulated. There is no objective method of measuring these concepts within an application. Generally, developers will discuss these ideas using comparative terms-for instance, a module may be said to be less tightly coupled to another module than it was before its interfaces were refactored. But it might be difficult to say whether or not a given module is tightly coupled to another, without some means of comparing the nature of its coupling.
For more information on these and related topics, please refer to the following Wikipedia articles:
The only purpose of a module in an application is to do something at the request of a consumer (i.e., another module or system). For instance, a database system would be worthless if there were no way to store or retrieve data. Therefore, a system must expose interfaces, well-known methods and properties that other modules can use to make requests. A module’s interfaces are the gateway to its functionality, and these are the arbiters of what goes into, or comes out of, the module.
Interface design is where the concepts of coupling and encapsulation really take on meaning. If an interface fails to encapsulate enough of the module’s internal design, consumers may rely upon some knowledge of the module, thereby tightly coupling the consumer to the module. Any change to the module’s internal implementation may require a modification to the implementation of the consumer. An interface can be said to be a contract expressed between the module and its consumers. The contract states that if the consumer specifies a certain set of parameters to the interface, a certain set of values will be returned. Simplicity is usually the key here; avoid defining interfaces that modify return-value types based on inputs. For instance, a stored procedure that returns additional columns if a user passes in a certain argument may be an example of a poorly designed interface.
Many programming languages allow routines to define explicit contracts. This means that the input parameters are well-defined, and the outputs are known at compile-time. Unfortunately, T-SQL stored procedures only define inputs, and the procedure itself can dynamically change its defined outputs. It is up to the developer to ensure that the expected outputs are well-documented and that unit tests exist to validate them. I refer to a contract enforced via documentation and testing as an implied contract.
Stored procedures: the database-as-API mindset
An Application Programming Interface (API) is a set of interfaces that allows a system to interact with another system. An API is intended to be a complete access methodology for the system it exposes. In database terms, this means that an API would expose public interfaces for retrieving data from, inserting data into, and updating data in the database.
My contention is that this set of database interfaces should comply with the same basic design rules as other interfaces: well-known, standardized sets of inputs that result in well-known, standardized sets of outputs. This set of interfaces should completely encapsulate all implementation details, including table and column names, keys, indexes, and queries. An application that uses the data from a database should not require knowledge of internal information-the application should only need to know that data can be retrieved and persisted using certain methods. I call this the “database-as-API” mindset.
In order to define such an interface, the first step is to define stored procedures for all external database access. Table-direct access to data is clearly a violation of proper encapsulation and interface design, and views may or may not suffice. Stored procedures are the only construct available in SQL Server that can provide the type of interfaces necessary for a comprehensive data API.
So, in short, I believe that all data access should be via a fully-defined API, implemented using stored procedures. Following are the advantages I believe are gained by this approach.
Ease of maintenance
Perhaps the biggest advantage to the “database-as-API” approach is the reduction in coupling that can be achieved between the object system and the database it uses as a back-end.
If a change to the database requires an application change, it can often be expensive to recompile and redeploy the application. Likewise, if application logic changes necessitate database changes, it can be difficult to know how changing the data structures or constraints will affect other applications that may need the same data. However, by using stored procedures with correctly defined interfaces and full encapsulation of information, coupling between the application and the database can be greatly reduced. Changes to one layer do not necessitate changes to the other layer, resulting in a database system that is much easier to maintain and evolve over time.
By reducing dependencies and thinking of the database as a data API rather than a simple application persistence layer, you can also arrive at a much more flexible application development process. Often, this can permit the database and application layers to be developed in parallel rather than in sequence, thereby allowing for greater scale-out of human resources on a given project.
Ease of testing
Another issue is testability. If stored procedures are properly defined, with well-documented and consistent outputs, testing is not at all hindered – unit tests can be easily created in order to support test-driven development (TDD). Furthermore, support for more advanced testing methodologies also becomes easier, not more difficult, thanks to stored procedures. For instance, consider use of mock objects, which are faÃ§ade methods that TDD practitioners create, which return specific known values. These are then substituted for real methods in testing scenarios such that testing any given method does not test any methods that it calls (any calls made from within the method being tested will actually be a call to a mock version of the method). This technique is actually much easier to implement for testing of data access when stored procedures are used, as mock stored procedures can easily be created and swapped in and out without disrupting or recompiling the application code being tested.
Yet another important issue is security. Ad hoc SQL (as well as dynamic SQL) presents various security challenges, including opening possible attack vectors (i.e., points of entry for an attacker) and making data access security much more difficult to enforce declaratively, rather than programmatically. This means that by using ad hoc SQL your application may be more vulnerable to being hacked, and you may not be able to rely on SQL Server to secure access to data. The end result is that a greater degree of testing will be required in order to ensure that security holes are properly patched and that unauthorized users cannot access data they’re not supposed to.
Note that SQL injection holes (where an attacker attempts to re-write server-side SQL by manipulating a client side text box) are equally problematic for both ad hoc SQL sent by applications and dynamic SQL used within stored procedures. In both cases, the key to solving this problem is use of proper parameterization. However, SQL injection is not the only kind of security problem that can occur. Stored procedures can help to combat such problems as escalation attacks, in which an attacker gains rights to resources other than those for which he is authorized. By using stored procedures, security is encapsulated within the data access interface. A pure ad hoc SQL solution, on the other hand, requires the application to programmatically ensure that authorization and authentication are properly enforced.
Finally, I will address the issue towards which online debates always seem to gravitate: performance. Proponents of ad hoc SQL make the valid claim that thanks to better support for query plan caching in SQL Server 2000 and 2005, stored procedures no longer offer much of a performance benefit (please note, this is only true if ad hoc or dynamic SQL is properly used – again, that means correct parameterization!). Although this sounds like a great argument for not having to use stored procedures, I personally believe that it is a non-issue. Given equivalent performance, I think the obvious choice is the more maintainable and secure option (i.e., stored procedures).
The stored procedure vs. ad hoc SQL question can become quite complex, with proponents of rapid software development methodologies such as TDD claiming that stored procedures slow down their process, and fans of object-relational mapping (ORM) technologies making claims about the benefits of those technologies over stored procedures. It does not help that many of the combatants in these battles happen to have a vested interest in ORM; some of the most heated debates in recent memory were started by inflammatory claims made by vendors of ORM tools. Many in the ORM community feel that the database should be used as nothing more than a very simple object persistence layer, and would probably be perfectly happy with a database that only had a single table with only two columns: a GUID to identify an object’s ID, and an XML column for the serialized object graph.
In the end, the stored procedure vs. ad hoc SQL question is really one of purpose. In my eyes, a database is much more than just a collection of data. It is also an enforcer of data rules, a protector of data integrity, and a central data resource that can be shared amongst multiple applications. For these reasons, I firmly believe that a decoupled, stored procedure-based design is the only way to go.
Adam Machanic is an independent database software consultant, based in Boston, Massachusetts. He has implemented SQL Server solutions for a variety of high-availability OLTP and large-scale data warehouse applications, and also specializes in .NET data access layer performance optimization. Adam is co-author of Pro SQL Server 2005 and is currently in the process of solo-authoring “Expert SQL Server 2005”. He regularly speaks at user groups and community events throughout New England.