Storage 101: The Language of Storage

Robert Sheldon continues his series on storage. This article covers some of the basics of performance metrics, HDDs, and SSDs.

The series so far:

  1. Storage 101: Welcome to the Wonderful World of Storage
  2. Storage 101: The Language of Storage
  3. Storage 101: Understanding the Hard-Disk Drive 
  4. Storage 101: Understanding the NAND Flash Solid State Drive
  5. Storage 101: Data Center Storage Configurations
  6. Storage 101: Modern Storage Technologies
  7. Storage 101: Convergence and Composability 
  8. Storage 101: Cloud Storage
  9. Storage 101: Data Security and Privacy 
  10. Storage 101: The Future of Storage
  11. Storage 101: Monitoring storage metrics

Storage drives come in many shapes and sizes, and it can be difficult to distinguish one from the other beyond their appearances because vendor-supplied information is sometimes confusing and obscure. Although the material has gotten better over the years, it can still be unclear. Yet understanding this information is essential to knowing how well a drive will perform, how much data it will hold, its expected lifespan, and other important features.

In the first article in this series, I introduced you to a number of storage-related concepts, all of which can play an important role in determining what each drive offers and how they differ. For example, a solid-state drive (SSD) that uses the Peripheral Component Interconnect Express (PCIe) interface will typically perform better than one that uses the Serial Advanced Technology Attachment (SATA) interface, and a SATA-based SSD will likely perform better than a SATA-based hard-disk-drive (HDD).

But the interface and drive type are only part of the equation when it comes to choosing storage media. You must also take into account latencies, input/output operations per second (IOPS), throughput, effective and usable capacities, data transfer size or I/O size, endurance, and other factors. Unfortunately, it’s no small matter trying to make sense of all these variables, especially with the inconsistencies among storage vendors in how they present information about their products.

In this article, I dig into concepts commonly used to describe storage media to help make sense of the information you’ll encounter when evaluating HDDs or SSDs for your organization. Many of the concepts are specific to performance, but I also discuss issues related to capacity and lifespan, particularly in how they differ between HDDs and SSDs.

Making Sense of Performance Metrics

When architecting storage solutions to meet enterprise requirements such as performance, you should identify the workloads that the devices must support. To this end, you must understand data access patterns such as read operations versus writes, random operations versus sequential, and block transfer size.

In this regard, storage operations can be divided into four types: random reads, random writes, sequential reads, and sequential writes. In some cases, these operations can be further divided by the block transfer size (small versus large), which depends on the application. Many workloads use a mix of two or more of these types, although they might favor one over the others. Common data access patterns include:

  • Random read/write, small block: A wide variety of applications such as transactional business applications and associated databases.
  • Sequential write, large block: Loading media, loading data warehouse.
  • Sequential read, large block: Reading media, data warehouse aggregations and reporting.
  • Sequential write, small block: Database log writes.
  • Mixed: Any combination of the above. Note that, when multiple sequential workloads are active concurrently, the workload becomes randomized.

When shopping for storage, you’ll encounter an assortment of metrics that describe how well a drive is expected to perform. Understanding these metrics is essential to ensuring that you’re using the right drives to support your specific workloads.

Latency refers to a drive’s response time, that is, how long it takes for an I/O operation to complete. From an application’s perspective, latency is the time between issuing a request and receiving a response. From a user perspective, latency is the only metric that matters.

Vendors list latency in milliseconds (ms) or microseconds (µs). The lower the number, the shorter the wait times. However, the rate can quickly increase as individual I/O requests start piling up, the I/O sizes increase (which typically range between 4 KB to 512 KB), or the nature of the workload changes, such as changing from read-only to read/write or from sequential to random. For example, a drive’s latency might be listed as 20ms, but if the drive is supporting concurrent read operations, I/O requests could end up in a long queue, causing a dramatic increase in latency.

Latency is nearly always a useful metric when shopping for drives and should be considered in conjunction with IOPS. For example, a storage solution that provides 185 IOPS with an average latency of 5ms might deliver better application performance than a drive that offers 440 IOPS but with 30ms latency. It all depends on the workloads that the drives will need to support.

Another common metric is IOPS, which indicates the maximum number of I/O operations per second that the drive is expected to support. An I/O operation is the transfer of data to or from the drive. The higher the number of supported IOPS, the better the performance—at least according to conventional wisdom. In truth, IOPS tells only part of the story and should be considered along with other important factors, such as latency and I/O size. IOPs is most relevant for random data access patterns (and far less important for sequential workloads).

Another important metric is throughput, which measures the amount of data that can be written to or read from a storage drive within a given timeframe. Some resources may refer instead to data transfer rate or simply transfer rate, sometimes according to drive type. For example, you might see transfer rate used more often with HDDs and throughput associated with SSDs. Like other performance metrics, throughput is dictated by the nature of the workload. Throughput is most relevant for sequential data access patterns (and far less relevant for random workloads).

The distinction between sequential and random is particularly important for HDDs because of how data is stored on the platters, although it can also play a role in SSD performance. In fact, there are other differences between the two drive types that you need to understand when evaluating storage. Discussions of device capacity and endurance also follow.

What Sets HDDs Apart

Data is written to an HDD in blocks that are stored sequentially or scattered randomly across a platter. Enterprise drives contain multiple platters with coordinated actuator arms and read/write heads that move across the platters (a topic I’ll be covering more in-depth later in the series).

Whenever an application tries to access the data, the platter’s actuator arm must move the head to the correct track and the platter must be rotated to the correct sector. The time required to do so is referred to as the seek time. The time it takes for the platter to rotate to the correct sector is referred to as the rotational latency.

The duration of an I/O operation depends on the location of the head and platter prior to the request. When the data blocks are saved sequentially on the disk, an application can read and write data in relatively short order, reducing seek times and rotational latencies to practically nothing. If the blocks are strewn randomly across the platters, every operation requires the actuator to move the head to a different area on the platter, resulting in rotational latency and seek time, and therefore slower performance.

Because of these differences, it is essential to evaluate HDDs in terms of the workloads you plan to support, taking into account such factors as file size, concurrency, and data access patterns (random vs. sequential, read vs. write, big block vs. small block, and mixed). By taking your workloads into account, you’ll have a better sense of how to select the right drives for the workloads demanded by your organization’s applications.

For example, Dell offers a 12-TB SATA drive that supports up to 118 IOPS for random reads and 148 IOPS for random operations that include 70% reads and 30% writes at a given latency. Whereas the same drive offers a throughput of 228 MB/s for sequential read and write operations. From these metrics, you can start getting a sense of whether the drive can meet the needs of your anticipated workloads.

Suppose you’re looking for a storage solution to support a set of read-intensive web applications whose storage access pattern is primarily random reads, as opposed to sequential reads. You would want to compare the Dell drive against other drives to determine which one might offer the best IOPS, with less emphasis placed on the other types of access patterns.

One characteristic driving HDD performance is revolutions per minute (RPM), that is, the number of times the drive’s platters rotates within a minute. The higher the number of RPMs, the faster the data can be accessed, leading to lower latency rates and higher performance. Enterprise-class HDDs typically support 10,000 or 15,000 RPMs, often written as simply 10K or 15K RPMs.

You must also take into account a drive’s available capacity, keeping in mind that you never load an HDD anything close to its physical capacity.

A drive’s anticipated lifespan is indicated by the mean time between failures (MTBF) rating, the number of operating hours expected before failure. For example, Dell offers several 14-TB and 16-TB drives with MTBF ratings of 2,500,000 hours, which comes to over 285 years. Such ratings are common, yet in reality drives fail far more frequently than high MTBF suggests. MTBF is only a small part of the reliability equation. Enterprise solutions demand the identification and elimination of single points of failure and redundancies across components, starting at the drive level.

By looking at the various metrics, you have a foundation to begin comparing drives. Historically, marketing considerations—trying to present their drives in the best light—resulted in presenting performance specs in a way that made comparisons across vendors challenging. Today’s specifications are generally more consistent, at least enough to make reasonable apples-to-apples comparisons. Some vendors also provide insights beyond the basics, for example, providing performance metrics featuring a mix of workloads, such as 70% reads and 30% writes at a given latency.

What Sets SSDs Apart

Consumer and enterprise SSDs are based on NAND flash technologies, a type of nonvolatile memory in which data is stored by programming integrated circuit chips, rather than manipulating magnetic properties, as with an HDD. Also unlike the HDD, the SSD has no moving parts, which makes reading and writing data much faster operations.

If you’re relatively new to enterprise SSDs, the terminology that surrounds these drives can be confusing. Yet the fundamental performance considerations are exactly the same: latency, IOPs, throughput, capacity, and endurance .

As with an HDD, capacity in an SSD refers to the amount of data it can hold. With SSDs, however, vendors often toss around multiple terms related to capacity and do so inconsistently, so it’s not always clear what each one means. For example, some vendors list a drive’s total capacity as raw capacity or just capacity, both of which refer to the same thing—the maximum amount of raw data that the drive can hold.

Not all of the drive’s raw capacity is available for storing data. The drive must be able to accommodate the system overhead required to support various internal SSD operations. For this reason, the amount of available capacity is always less than the amount of raw capacity. Vendors often refer to available capacity as usable capacity.

Conceptually, raw capacity and usable capacity are similar between HDDs and SSDs. For example, when calculating capacities, you should take into account that some of that space must be available for system data, as well as for recovery configurations such as RAID.

Another characteristic that sets the SSD apart is the way in which bits are stored in the flash memory cells. Today’s SSDs can store up to four bits per cell, with talk of five bits in the wings. Vendors often reference the bit-level in the drive’s specifications. For example, a drive that supports three bits per cell is referred to as triple-level cell (TLC) flash, and a drive that supports four bits per cell is referred to as quad-level cell (QLC) flash.

In addition to squeezing more bits into a cell, vendors are also trying to get more cells onto a NAND chip. The more bits the chip can support, the greater the data density.

As with HDDs, SSD endurance refers to the drive’s longevity or lifespan; however, it’s measured differently. SSD drive endurance is based on the number of program/erase cycles (P/E cycles) it will support. NAND cells can tolerate only a limited number of P/E cycles. The higher that number, the greater the endurance. A drive’s endurance is measured by its write operations because read operations have minimal impact on an SSD. Whereas the HDD endurance metric is MTBF, vendors report an SSD’s endurance by providing the number of drive writes per day (DWPD), terabytes written (TBW), or both. The DWPD metric refers to the number of times you can completely overwrite the drive’s capacity each day during its warranted lifespan. The TBW metric indicates the total number of write operations that the drive will support over its lifetime. The DWPD and TBW are quite useful for comparing drives.

As vendors squeeze more bits into each cell and more cells into each chip, SSD vendors incorporate sophisticated technologies to mitigate the challenges of maintaining data integrity concomitant with higher data densities. For example, all SSDs employ wear leveling, over-provisioning, garbage collection, and error correction code (ECC) to extend the drive’s life, all of which I’ll be covering in more detail later in the series.

Moving toward a better understanding of storage

You should use the vendors’ published information only as a starting point for evaluating drives and identifying candidates. Consider additional research such as community reviews, benchmarks, or other resources to give you a realistic understanding of a drive’s capabilities. Your goal is to get as complete a picture as possible before investing in and testing solutions.

In addition, when comparing storage solutions, be sure to take into account the effective capacity, which is the amount of storage available to a drive after applying data-reduction technologies such as deduplication and compression. Such data-reduction strategies make it possible for the drive to store much more data. For example, IBM offers a flash-based storage system that provides 36.9 TB of raw capacity but supports up to 110 TB of effective capacity

Clearly, you need to understand a wide range of concepts to determine what types of storage will best support your workloads. Not only must you choose between HDDs and SSDs, but you must also select from within these categories, while taking into account such factors as latency, IOPs, throughput, densities, and numerous others considerations.

In the articles to follow, I’ll be digging into HDDs and SSDs more deeply so you can better understand their architectures and how they differ in terms of capacity, performance, and endurance. With this foundation, you’ll be better equipped to determine what storage solutions you might need for your organization, based on your applications and workloads.