Storage 101: Welcome to the Wonderful World of Storage

Understanding all the storage options available today is not easy. In this new series, Robert Sheldon will explain what you need to know to make sure that you get the right storage for your servers.

The series so far:

  1. Storage 101: Welcome to the Wonderful World of Storage
  2. Storage 101: The Language of Storage
  3. Storage 101: Understanding the Hard-Disk Drive 
  4. Storage 101: Understanding the NAND Flash Solid State Drive
  5. Storage 101: Data Center Storage Configurations
  6. Storage 101: Modern Storage Technologies
  7. Storage 101: Convergence and Composability 
  8. Storage 101: Cloud Storage
  9. Storage 101: Data Security and Privacy 
  10. Storage 101: The Future of Storage
  11. Storage 101: Monitoring storage metrics
  12. Storage 101: RAID

Gone are the days when implementing storage was simply a matter of standing up a few disk drives. Today’s data-intensive workloads require a variety of storage solutions to handle the unprecedented amounts of heterogeneous data. At the same time, storage technologies are quickly evolving, with new options available every day. Administrators can now choose from a wide range of storage types and configurations. Unfortunately, it’s not always clear which ones might best meet their needs.

This article is the first in a series that examines the diverse world of today’s storage solutions. In this series, I plan to cover everything from storage fundamentals, such as device types and storage metrics, to more advanced topics, ranging from hyperconvergence to intelligent storage. The goal is to provide you with a foundation for better understanding of current storage technologies and how to use them in your organization to support your data-intensive workloads. This article provides a starting point toward that goal by introducing you to basic concepts that serve as building blocks for today’s storage solutions.

Storage Media

Two types of storage media support most of today’s application workloads: hard-disk drives (HDDs) and solid-state drives (SSDs). Both are forms of non-volatile storage. In other words, they can persist data even if the power gives out, unlike traditional random-access memory (RAM), which is much faster, but also costlier and, of course, volatile.

The HDD has long had the reputation as the data center’s go-to workhorse for supporting an assortment of applications. Over the years, it has steadily evolved to handle greater volumes of data while delivering better read and write performance.

The HDD contains one or more spinning platters on which the data is stored. The platters are protected within sealed casing, along with other components, such as the magnetic heads that read the data. Today’s HDDs can store 16 TB or more of data and, despite the wave of SSDs, continue to play a vital role in enterprise storage.

Even so, the SSD has made significant inroads into data centers of all sizes, in part because of increased demand for high-performing storage, but also because of technological improvements and plummeting prices.

Unlike the HDD, the SSD includes no moving parts and instead stores data on interconnected silicon chips, resulting in better performance, while requiring less space and power. Currently, most SSD storage is based on NAND flash technologies, in which data bits are stored in cells and regulated with electric charges.

Both HDDs and SSDs come with advantages and disadvantages (a discussion I’ll save for later in the series). For this reason, many organizations opt to use both, employing SSDs for high-performing workloads and sticking with HDDs for the rest.

Some organizations have also implemented hybrid arrays, storage systems that incorporate both HDDs and SSDs. The exact ratio of HDDs to SSDs, as well as the array’s configuration, depends on the hybrid solution itself, but the result is the same. You get the benefits of both storage types while minimizing their disadvantages.

Tape storage is also still in use in some organizations, mostly for backup and archiving purposes. In a tape storage system, the data is saved to magnetic tape, which is usually encased in some type of cassette or cartridge. Tape storage is not particularly fast or efficient, in terms of supporting data-intensive workloads, but it can store large quantities of data at a relatively low price when compared to the other storage options.

Interfaces, Form Factors, and Storage Protocols

When it comes to storage, few topics cause as much confusion as the terms interface, form factor, and storage protocol. Throughout the industry, these terms are used interchangeably, inconsistently, and imprecisely when describing storage-related technologies. As it turns out, there’s a good reason for this. Many of these technologies don’t fit neatly into any one of these categories or into other categories for that matter, which makes it possible to describe them in multiple ways.

For example, Serial Advanced Technology Attachment (SATA) is commonly referred to as a connection interface. SATA is one of the most popular interfaces out there, supporting a wide variety of HDD and SSD storage devices. Compared to its predecessor—Parallel Advanced Technology Attachment (PATA)—SATA is faster, requires smaller physical connectors, and supports hot-swapping capabilities.

In addition to being referred to as an interface, SATA is also referred to as a form factor and a protocol. This is because SATA is as much a standard as it is an interface and, as such, defines a complete methodology for connecting storage devices to computer systems and for transferring data. As a result, the term SATA is often used to reference components throughout the storage stack, including the transport layer, physical connectors and storage devices themselves.

To help bring order to this muddle, at least for the purposes of this series, I use the term interface to refer specifically to how a storage device connects to a computer. The interface defines the physical and logical characteristics for enabling data transfers, which can include everything from the computer’s bus connectors to the signaling technology that drives data transfers.

Related to the interface is the form factor, which refers the size and shape of the storage device, and the protocol, which is a set of rules that define how the storage device and computer communicate with each other. A storage device of a specific form factor connects to the computer via an interface, using a protocol to determine how to pass data to and from the computer.

Let’s look at an example to help clarify how all this works. Another technology that qualifies as an interface is Peripheral Component Internet Express (PCIe), which, unlike other interfaces, makes it possible to connect a storage device directly to the motherboard. Because of this direct connection, PCIe can boost performance and reduce latencies. In addition, PCIe expansion slots come in multiple configurations, which are based on the number of data lanes. For example, an x16 expansion slot uses 16 data lanes.

The PCIe interface has been around for several years, primarily supporting peripheral devices such as graphic or network cards. More recently the PCIe interface has found a home with the SSD, which can take better advantage of the faster data transfer speeds than the typical HDD, especially with the introduction of the Non-Volatile Memory Express (NVMe) protocol.

NVMe is a relatively new protocol developed from the ground up to accommodate the needs of PCIe-connected SSD storage devices. The protocol leverages parallel, low-latency data paths to enable high-performing transfers.

At the same time, NVMe is more than just a protocol. It’s a communication standard that defines multiple components, including a register interface and command set, as well as a management interface. Even so, NVMe is commonly referred to as a storage protocol, although I’ve also seen it referred to as an interface and form factor. (Confused yet?)

In recent years, the PCIe/NVMe duo has become extremely popular. The storage market is now flush with SSD products that fit into PCIe slots and use the NVMe protocol to transfer data. And because PCIe supports multiple expansion slots, the SSDs are available in a variety of form factors.

For example, many vendors now offer SSDs in the M.2 form factor, which comes in a variety of sizes, such as 22mm X 30 mm, 22mm X 80mm, or 30mm X 42mm. What makes the M.2 form factor even more unique is that it’s available for both the SATA interface and the PCIe interface. A PCIe-based SSD with an M.2 form factor is essentially a PCIe card that fits into a PCIe slot. The SSD uses the PCIe interface to connect to the computer and uses the NVMe protocol to carry out communications and transfer data.

Not surprisingly, the topic of interfaces, form factors, and storage protocols is far more involved than what I’ve covered here. Later in this series, I’ll dig into these technologies in more depth. But for now, let’s move on to storage configurations.

Storage Configurations

For the purposes of this article, when I talk about storage configurations, I’m referring primarily to direct-attached storage (DAS), network-attached storage (NAS), and the storage area network (SAN), with cloud storage thrown into the mix to shake things up.

As the name suggests, DAS refers to storage devices that attach directly to a computer through one of the common interfaces, such as SATA, PCIe, USB, or Thunderbolt. DAS is the most basic of the three configurations and does not support many of the advanced features available to the other configurations. At the same time, DAS is usually the easiest to implement and maintain and the least expensive. However, DAS is not particularly scalable, which tends to limit its use to small-business setups that share data locally.

In general, DAS has had a diminishing role in supporting many of today’s more robust applications. However, the emergence of the hyperconverged infrastructure (HCI) has breathed new life into DAS. Most HCI platforms are made up of multiple server nodes, each with its own storage. The storage is abstracted across all nodes to create logical resource pools available to applications running in the HCI environment.

For many workloads, IT teams are more inclined to implement a NAS (network-attached storage) solution, which is commonly deployed as an array of storage devices connected through the local area network (LAN). The solution has its own processing and memory resources, along with its own operating system (OS) and supporting software. Compared to DAS, a NAS solution is more scalable, and it supports more advanced features, such as thin provisioning and snapshots. But it’s also more expensive.

Next up the ranks is the SAN (storage area network), a high-performing storage solution typically used in larger data centers to support enterprise workloads and mission-critical applications. A SAN often runs in a dedicated network such as Fibre Channel. It is more complex and expensive than the other options, but also more scalable, resilient, and performant, with consistently high throughputs and low latencies. A SAN offers the best of both DAS and NAS solutions, but at a price.

Although NAS and SAN systems have served as the backbone for most enterprise applications, many organizations are now turning to the cloud to meet their storage needs. Cloud vendors provide on-demand storage services with pay-as-you-go subscription models, helping to avoid over-provisioning and extensive up-front costs.

Cloud platforms are highly scalable, easy to manage, provide metered resources, and include built-in redundancy, but they can get quite pricey as subscription fees add up. In addition, they don’t offer the level of control you get with on-premises storage.

To address the control issue, many organizations are turning to hybrid cloud storage, in which some data is stored on-premises and other data stored in the cloud, making it easier to ensure security, privacy, and compliance where needed. An effective hybrid solution also includes mechanisms for seamlessly managing and moving data between platforms.

Storage Types

Most storage falls into one of three categories: file, block, or object. File storage is the most basic of the three. It is the type of storage you use when you sit down at your computer and open files, save new files to disk, or inadvertently delete them. The files are stored in a hierarchical format, according to the familiar directly/subdirectory structure. Each file is tagged with a limited amount of fixed metadata, such as its name, file size, or modified date.

File storage is easy to work with and well understood by most PC users and applications, which is why it’s commonly used in DAS and NAS storage solutions. Unfortunately, file storage can get fairly cumbersome as the number of files grows, making it more difficult to scale data resources or find files when you need them.

Block storage addresses these challenges by breaking the data into chunks (blocks) and storing them as individual pieces, with no attached metadata except for a unique identifier. (Some would even argue that the identifier doesn’t qualify as metadata.) With block storage, it’s up to the managing application to determine how to organize and retrieve the blocks, keeping the data structure itself as simple as possible.

Block storage is the format of choice in a SAN storage system. Because of the lack of metadata, block storage comes with little overhead, resulting in faster data retrievals and more efficient data storage. Block storage is well suited for workloads such as relational databases, virtual desktop infrastructures, and email servers, as well as for implementing RAID arrays. On the other hand, block storage also comes with several disadvantages, such as limited scalability and increased complexity. It can also get quite pricey.

A relative newcomer to the storage scene, object storage was developed to address the growing silos of unstructured data arriving with every wave of new Internet technology—from social media to big data analytics to the Internet of Things (IoT). In object storage, data is broken into self-contained, modular units (objects) that include identifiers and customizable metadata.

Object storage has its origins in the cloud, although on-premises solutions are quickly making their mark. Because object storage is fast, flexible, and accommodates configurable metadata, it is well suited for large volumes of unstructured data, advanced analytics, and web applications, as well as for backups and archiving. But object storage is not known for its performance, and even read operations can experience latency issues. Plus, all that metadata can translate to extra overhead, impacting performance even more.

The New Age of Storage

Clearly, there are many factors to consider when planning storage for your application workloads, and the topics I’ve discussed here barely scratch the surface. As the series progresses, I’ll dig deeper into these issues and go into other aspects of storage as well to help provide a fuller picture of today’s storage landscape.

Keep in mind, however, that the storage industry is a dynamic one, with new innovations every day, driven in no small part by the steady influx of data, which is outpacing our ability to maintain it. More than ever, developers, administrators, and anyone else working with data need to understand the storage technologies at hand and the new innovations heading their way and how they might help support their data-intensive workloads now and in the foreseeable future.