Organizations that support data-intensive applications must make many decisions about how to best implement and maintain them. One of the biggest decisions is to determine the best platforms to use for storing and delivering the application data. In the past, most organizations opted for SQL databases because of their ability to protect data and ensure its integrity. But the rise of the internet and cloud technologies—and the proliferation of data that went with them—has caused many organizations to turn to NoSQL databases, in large part because they can better handle the abundance of unstructured and semi-structured data.
Despite this trend, many IT teams continue to support more traditional workloads, often in conjunction with their modern applications, and it’s not always clear which type of database systems they should choose—SQL or NoSQL. Both offer advantages and disadvantages, but they differ in how they’re built, how they store data, and how applications access them. And it’s only by understanding these differences can an organization make an informed decision about which type will best suit their workloads now and in the foreseeable future.
Introducing SQL databases
An SQL database is commonly referred to as a relational database because it’s based on the relational model introduced by Edgar F. Codd in the early 1970s. The relational model defines a methodology for organizing structured data into relations (tables with columns and rows) and for defining the relationships between those tables. Of course, there’s a lot more to relational theory than this—and I’ve been fairly loose with its terminology—but the important point to know is that the relational database, after its introduction, soon became the de facto standard for storing and managing data in organizations of all sizes and remains a prevalent technology to this day.
At the heart of the relational model is the Structured Query Language (SQL), a standards-based programming language used to define database schema and the relationships between tables. The language is also used to store, manipulate, and retrieve data from those tables. SQL has been adopted by both the American National Standards Institute (ANSI) and the International Organization for Standardization (ISO) and is well known and widely supported by developers around the globe.
Relational databases offer many important features that make them aptly suited to enterprise workloads, which is why organizations have been turning to them for so long. They’re optimized for handling highly structured data, and their inherent characteristics—such as normalization, atomicity, and consistency—ensure the integrity of that data throughout its lifespan. These features also contribute to better storage utilization, while providing flexible query support through standards-based SQL.
However, relational databases are not without their challenges. They’re great for structured data but not so great for semi-structured or unstructured data, especially at scale. In fact, SQL database can be difficult to scale horizontally, even for structured data, making it difficult to use them for distributed big data workloads. A relational database also requires a rigid schema that must be carefully planned and does not easily accommodate changing requirements, leaving little room for the dynamic nature of many of today’s applications and development methodologies.
Despite these challenges, however, SQL databases remain popular choices for many organizations, with vendors offering an assortment of sophisticated relational database products, such as Microsoft SQL Server, Oracle Database, IBM DB2, MySQL, PostgreSQL, and many others.
Introducing NoSQL databases
Because of the influx of unstructured and semi-structured data, many organizations have been opting for NoSQL databases, a term generally taken to mean “not SQL” or “not only SQL.” Although such naming is somewhat vague, it points to a category of databases that are more flexible and scalable than traditional SQL databases. NoSQL databases don’t adhere to the rigid schema structure inherent in relational databases, nor are they restricted to a single data model like these databases.
In fact, there is nothing to say that a NoSQL database has to follow any particular model, and the industry remains dynamic enough to imagine any possible scenarios. Even so, four basic NoSQL models have emerged, with products available in each one:
- Key-value databases. Data is stored in a key-value structure that associates unique identifiers with specific data blobs, making it possible to store just about any type of data in whatever form it’s delivered. Example products include Redis, Berkeley DB, and Amazon DynamoDB.
- Document databases. Data is stored as documents in a format such as JSON or XML, with each document assigned its own unique key, but without being limited to predefined fields or elements. Example products include MongoDB, CouchDB, and BaseX.
- Column-oriented databases. Data is stored as strongly-typed columns rather than rows, making it possible to query and aggregate large volumes of data very quickly. These types of databases go by other names as well, such as column-store, wide-column store, and column-family. Example products include HBase, Bigtable, and Cloudera.
- Graph databases. Data is stored in graph structures that define how the data is interconnected, essentially treating the relationships between the data as important as the data itself. Product examples include Neo4J, and InfiniteGraph.
Each type of NoSQL database offers advantages and disadvantages, depending on the workloads an organization is trying to support. In addition, not all NoSQL products fit neatly into each of these categoris. For example, Azure Cosmos DB provides five APIs that make the service more universal. These include the SQL API, API for MongoDB, Cassandra API, Gremlin API, and Table API.
Regardless of the product, however, any credible NoSQL database should be able to scale horizontally and handle distributed big data workloads while offering far more flexibility than SQL databases. NoSQL databases also make it easier for development teams to get started on their projects because they’re not locked into rigid schema structures. That said, NoSQL databases are not as mature as relational database products and typically cannot guarantee the same levels of data integrity.
Comparing SQL and NoSQL databases
When deciding between SQL and NoSQL databases, it can help to see a side-by-side comparison of the two types to better understand their differences. The following table breaks down many of the main characteristics that set SQL and NoSQL apart.
SQL databases |
NoSQL databases |
|
Data structure |
The SQL data structure is based on a relational model that normalizes data across strictly defined tables and standardizes the relationships between those tables, making SQL databases well suited to highly structured data. |
The NoSQL data structure does not require a normalized configuration or adhere to a relational model but is instead flexible enough to accommodate different models, including key-value, document, column-oriented, and graph. |
Language |
SQL databases are all about the SQL language. Some relational database products support pure SQL, but many include enhanced versions of the language—such as SQL Server’s Transact-SQL (T-SQL)—to accommodate product-specific features. However, all SQL databases support the core ANSI/ISO language elements. |
NoSQL databases are not locked into one language. The language used depends on the type of NoSQL database, the individual implementation, and the specific operation. For example, MongoDB stores all documents in a JSON format, with queries based on the JavaScript programming language. |
Schemas |
An SQL database requires a predefined schema that determines how tables are configured and data is stored, resulting in a rigid structure that helps to optimize storage and ensure data integrity, but limits flexibility. |
A NoSQL database uses a dynamic schema that requires no predefined data structure, resulting in a a high degree of flexibility, such as being able to add documents with different fields to the same database. |
Data integrity |
SQL databases deliver a high degree of data integrity, adhering to the principles of atomicity, consistency, isolation, and durability (ACID), which are essential when supporting workloads such as financial transactions. |
It can be difficult for NoSQL databases to deliver the same level of data integrity as SQL databases, with most adhering to BASE principles (basic availability, soft state, and eventual consistency), which means data in a distributed environment might be temporarily inconsistent. |
Scalability |
SQL databases primarily scale vertically, which means they can be easily scaled up by adding resources such as CPUs or memory, but SQL databases are not very efficient at scaling horizontally, making them ill-suited for large, distributed data sets. |
NoSQL databases can scale horizontally very efficiently across systems and locations, making it possible to accommodate large stores of distributed data, while supporting increased levels of traffic. |
Querying |
SQL databases are efficient at processing queries and joining data across tables, making it easier to perform complex queries against structured data, including ad hoc requests. |
NoSQL databases lack consistency across products and typically require more work to query data, particular as query complexity increases. |
Maturity |
SQL databases are built on mature technologies that are well known and supported by large developer communities. |
NoSQL products are not as mature and the technologies not as well supported as SQL products, but NoSQL technologies are making fast inroads into the industry, with developer communities constantly growing. |
As handy as it can be to compare SQL and NoSQL databases in this way, the differences between them are not always so black-and-white. Vendors have been steadily incorporating features into their products to make them more universal. For example, MongoDB now supports multi-document ACID transactions, and MySQL now includes a native JSON data type for storing and validating JSON documents.
How to choose between SQL and NoSQL databases
The decision between SQL and NoSQL will depend largely on the workloads you plan to support and the structure and amount of data. However, you should also consider the differences in the database products themselves, such as maturity, stability, licensing fees, vendor support, and the extent and participation of the developer communities.
In the meantime, the following table provides a few general guidelines you might consider when weighing one type against the other.
Consider SQL databases when… |
Consider NoSQL databases when… |
|
|
Again, these are just guidelines. You should consider each situation individually, taking into account the shape of your data and workload requirements. At the same time, keep in mind that you’re not limited to one database type over the other. Many organizations have implemented both SQL and NoSQL database systems to meet their different requirements, making it possible to get the best of both worlds. Also, keep in mind that database technologies are continuously evolving, with new variables regularly being added to the mix.
Regardless of how the industry is changing, however, one thing remains certain: the more thoroughly you understand your data and the available database options, the more informed a decision you can make when choosing a database system and the better you’ll be able to support your workloads going forward.
Load comments