7 Common Database Design Errors
The art of designing a good database is like swimming. It is relatively easy to start and difficult to master. If you want to learn to design databases, you should for sure have some theoretic background, like knowledge about database normal forms and transaction isolation levels. But you should also practice as much as possible, because the sad truth is that we learn most… by making errors.
Why Talk About Errors?
The art of designing a good database is like swimming. It is relatively easy to start and difficult to master. If you want to learn to design databases, you should for sure have some theoretic background, like knowledge about database normal forms and transaction isolation levels. But you should also practice as much as possible, because the sad truth is that we learn most… by making errors.
In this article we will try to make learning database design a little simpler, by showing some common errors people make when designing their databases.
Note that we will not talk about database normalization – we assume the reader knows database normal forms and has a basic knowledge of relational databases. Whenever possible, covered topics will be illustrated by models generated using Vertabelo and practical examples.
This article covers designing databases in general, but with emphasis on web applications, so some examples may be web application-specific.
Model Setup
Let’s assume we want to design a database for an online bookstore. The system should allow customers to perform the following activity:
- browse and search books by book title, description and author information,
- comment on books and rate them after reading,
- order books,
- view status of order processing.
So the initial database model could look like this:
To test the model, we will generate SQL for the model using Vertabelo and create a new database in PostgreSQL RDBMS.
The database has eight tables and no data in it. We have populated the database with some manually created test data. Now the database contains some exemplary data and we are ready to start the model inspection, including identifying potential problems that are invisible now but can arise in the future, when the system will be used by real customers.
1 – Using Invalid Names
Here you can see that we named a table with a word “order”. However, as you probably remember, “order” is a reserved word in SQL! So if you try to issue a SQL query:
1 |
SELECT * FROM ORDER ORDER BY ID |
the database management system will complain. Luckily enough, in PostgreSQL it is sufficient to wrap the table name in double quotes, and it will work:
1 |
SELECT * FROM "order" ORDER BY ID |
Wait, but “order” here is lowercase!
That’s right, and it is worth digging deeper. If you wrap something in double quotes in SQL, it becomes a delimited identifier and most databases will interpret it in a case-sensitive way. As “order” is a reserved word in SQL, Vertabelo generated SQL which wrapped it automatically in double quotes:
1 2 3 4 5 6 |
CREATE TABLE "order" ( id int NOT NULL, customer_id int NOT NULL, order_status_id int NOT NULL, CONSTRAINT order_pk PRIMARY KEY (id) ); |
But as an identifier wrapped in double quotes and written in lower case, the table name remained lower case. Now, if I wanted to complicate things even more, I can create another table, this time named ORDER
(in uppercase), and PostgreSQL will not detect a naming conflict:
1 2 3 4 5 6 |
CREATE TABLE "ORDER" ( id int NOT NULL, customer_id int NOT NULL, order_status_id int NOT NULL, CONSTRAINT order_pk2 PRIMARY KEY (id) ); |
The magic behind that is that if an identifier is not wrapped in double quotes, it is called “ordinary identifier” and automatically converted to upper case before use – it is required by SQL 92 standard. But identifiers wrapped in double quotes – so called “delimited identifiers” – are required to stay unchanged.
The bottom line is – don’t use keywords as object names. Ever.
Did you know that maximum name length in Oracle is 30 chars?
The topic of giving good names to tables and other elements of a database – and by good names I mean not only “not conflicting with SQL keywords”, but also possibly self-explanatory and easy to remember – is often heavily underestimated. In a small database, like ours, it is not a very important matter indeed. But when your database reaches 100, 200 or 500 tables, you will know that consistent and intuitive naming is crucial to keep the model maintainable during the lifetime of the project.
Remember that you name not only tables and their columns, but also indexes, constraints and foreign keys. You should establish naming conventions to name these database objects. Remember that there are limits on the length of their names. The database will complain if you give your index a name which is too long.
Hints:
- keep names in your database:
- possibly short,
- intuitive and as correct and descriptive as possible,
- consistent;
- avoid using SQL and database engine-specific keywords as names;
- establish naming conventions (read more about planning and setting up a naming convention for your database)
Here is the model with order
table renamed to purchase
:
Changes in the model were as follows:
2 – Insufficient Column Width
Let’s inspect our model further. As we can see in book_comment
table, the comment
column’s type is character varying (1000)
. What does this mean?
If the field will be plain-text in the GUI (customers can enter only unformatted comments) then it simply means that the field can store up to 1000 characters of text. And if it is so – there is no error here.
But if the field allows some formatting, like bbcode or HTML, then the exact amount of characters a customer can enter is in fact unknown. If they enter a simple comment, like this:
1 |
I like that book! |
then it will take only 17 characters. But if they format it using bold font, like this:
1 |
I <b>like</b> that book! |
then it will take 24 characters to store while the user will see only 17 in the GUI.
So if the bookstore customer can format the comment using some kind of WYSIWYG editor then limiting the size of the “comment” field can be potentially harmful because when the customer exceeds the maximum comment length (1000 characters in raw HTML) then the number of characters they see in the GUI can still be much below 1000. In such a situation just change the type to text
and don’t bother limiting length in the database.
However, when setting text field limits, you should always remember about text encoding.
Type
varchar(100)
means 100 characters in PostgreSQL but 100 bytes in Oracle!
Instead of explaining it in general, let’s see an example. In Oracle, the varchar
column type is limited to 4000 bytes. And it is a hard limit – there is no way you can exceed that. So if you define a column with type varchar(3000 char)
, then it means that you can store 3000 characters in that column, but only if it does not use more than 4000 bytes on disk. If it exceeds the limit, Oracle will throw an error when attempting to save the data to the database. And why could a 3000-character text exceed 4000 bytes on disk? If it is in English it cannot. But in other languages it may be possible. For example, if you try to save the word “mother” in Chinese – 母親 – and your database encoding is UTF-8, then such a string will have 2 characters but 6 bytes on disk.
Note that different databases can have different limitations for varying character and text fields. Some examples:
BMP (Basic Multilingual Plane, Unicode Plane 0) is a set of characters that can be encoded using 2 bytes per character in UTF-16. Luckily, it covers most characters used in all the world.
- Oracle has the aforementioned limit of 4000 bytes for
varchar
column, - Oracle will store CLOBs of size below 4 KB directly in the table, and such data will be accessible as quickly as any
varchar
column, while bigger CLOBs will take longer to read as they are stored outside of the table, - PostgreSQL will allow an unlimited
varchar
column to store even a gigabyte-long string, silently storing longer strings in background tables to not decrease performance of the whole table.
Hints:
- limiting length of text columns in the database is good in general, for security and performance reasons,
- but sometimes it may be unnecessary or inconvenient to do;
- different databases may treat text limits differently;
- always remember about encoding if using language other than English.
Here is the model with book_comment
type changed to text
:
The change in the model was as follows:
3 – Not Indexing Properly
There is a saying that “greatness is achieved, not given”. It is the same for performance – it is achieved by careful design of the database model, tuning of database parameters, and by optimizing queries run by the application on the database. Here we will focus on the model design, of course.
In our example let’s assume that the GUI designer of our bookstore decided that 30 newest comments will be shown in the home screen. So to select these comments, we will use the following query:
1 |
select comment, send_ts from book_comment order by send_ts desc limit 30; |
How fast does this query run? It takes less than 70 milliseconds on my laptop. But if we want our application to scale (work fast under heavy load) we need to check it on bigger data. So let’s insert significantly more records into the book_comment
table. To do so, I will use a very long word list, and turn it into SQL using a simple Perl command.
Now I will import this SQL into the PostgreSQL database. And while it is importing, I will check time of execution of the previous query. The results are summarized in the table below:
Rows in “book_comment” | Time of query execution [s] |
---|---|
10 | 0,067 |
160,000 | 0,140 |
200,000 | 0,200 |
430,000 | 0,286 |
500,000 | 0,327 |
600,000 | 0,362 |
As you can see, with increasing number of rows in book_comment
table it takes proportionally longer to return the newest 30 rows. Why does it take longer? Let’s see the query plan:
1 2 |
db=# explain select comment, send_ts from book_comment order by send_ts desc limit 30; QUERY PLAN |