The series so far:
- Storage 101: Welcome to the Wonderful World of Storage
- Storage 101: The Language of Storage
- Storage 101: Understanding the Hard-Disk Drive
- Storage 101: Understanding the NAND Flash Solid State Drive
- Storage 101: Data Center Storage Configurations
- Storage 101: Modern Storage Technologies
- Storage 101: Convergence and Composability
- Storage 101: Cloud Storage
- Storage 101: Data Security and Privacy
- Storage 101: The Future of Storage
- Storage 101: Monitoring storage metrics
In the previous article in this series, I introduced you to direct-attached storage (DAS), network-attached storage (NAS), and the storage area network (SAN), three storage configurations that have been widely implemented in both data centers and office settings. For the most part, these configurations represent traditional approaches to storing data, that is, they’ve been part of the storage landscape for decades and are starting to show their age.
That’s not to suggest they’re on their way to obsolescence, but they are being forced to make room for more modern technologies that are working their way into data centers, cloud environments, remote and branch offices, and other settings—often alongside or as part of their more traditional counterparts.
In this article, I introduce you to five important technologies that have been steadily infiltrating IT infrastructures: software-defined storage, virtual storage area network, intelligent storage, computational storage, and storage-class memory. Some of these technologies have been quicker to take hold than others, with the extent of adoption varying from one to the next, but they all represent important trends in data storage and are becoming an increasing infrastructure presence.
Although traditional storage configurations still play a vital role, they were not designed to meet the demands of today’s massive amounts of dynamic, distributed, and heterogenous data. Some IT teams are addressing these challenges by turning to software-defined storage (SDS), a software-based solution that provides an abstraction layer between the applications and storage devices, in effect unbundling the storage software from the underlying hardware.
Ideally, an SDS solution will run on commodity servers and support a wide range of storage devices, removing any dependencies on proprietary hardware or its software. The SDS solution controls storage requests from the application, while managing the storage resources themselves. Separating the data plane from the control plane in this way can lead to greater operational agility and control over where and how data is stored.
Although vendors take different approaches to SDS, solutions commonly use virtualization to consolidate the physical storage devices into logical resource pools that can be dynamically controlled and allocated to the applications that need them. An SDS solution exposes standards-based APIs for provisioning and managing resources, making it easier to automate operations, support development efforts such as infrastructure-as-code (IaC), and integrate with container orchestration tools such as Kubernetes.
One of the biggest advantages of SDS is flexibility. Not only do IT teams have more hardware choices, but applications also benefit because storage resources can be allocated and scaled on demand. In addition, an SDS solution can better utilize the physical resources, which can translate to lower costs, especially when you eliminate proprietary storage systems and the vendor lock-in that comes with them. SDS can sometimes even improve performance through the use of parallelism, data tiering, and data caching.
Yet SDS is not without its challenges. For starters, implementing and maintaining an SDS solution can be a complex undertaking, especially when working with multiple storage products from different vendors. These multi-vendor scenarios can also make it more difficult to get vendor support or even identify the source of a specific problem (further exacerbating support issues). In addition, SDS solutions might not be as hardware-agnostic as sometimes suggested, and some SDS products might not include all the features available to dedicated proprietary systems, although that has been steadily improving.
Another technology that organizations have been leveraging to help address modern workloads is the virtual storage area network (VSAN), a mechanism for separating and isolating traffic on networks such as Fibre Channel or Ethernet. In a VSAN configuration, the physical SAN is broken into logical partitions that segregate devices connected to the same fabric. For example, you can create VSANs to separate teams with different security or performance requirements, or you can use them to isolate backup traffic from production traffic.
This type of VSAN is different from what you see in products that use the term virtual SAN, VSAN, or even vSAN (lowercase “v”) to describe SDS capabilities. For example, VMware offers a product called vSAN (formerly Virtual SAN), an SDS solution used in conjunction with VMware vSphere to provide a foundation for hyperconverged infrastructures. VMware vSAN creates logical resource pools made up of the DAS devices connected to the infrastructure’s vSphere clusters, and then makes those resources available to the cluster’s virtual machines.
The VSAN in the context of this article has its roots in Cisco Systems and is specific to SAN implementations. Each logical VSAN supports the same operations and configurations available to the physical SAN, but they can be configured independently to meet specific needs. Devices within the vSAN can freely communicate with one another but cannot communicate with devices outside of their own VSAN, even if they’re connected to the same physical SAN. In this way, an organization can build a single SAN topology, but still have the benefits of logical topologies that are indifferent to the geographic locations of the SAN’s switches and attached devices.
Cisco’s VSAN was approved as a standard of the American National Standards Institute (ANSI) in October 2004, lending credence to its importance in the data center. Although a VSAN might not qualify as a modern technology, it has come to play an important role in cloud computing and virtualized environments because it enables SAN topology modifications without having to change the actual physical structure. VSANs also make it easier to scale storage resources to support fluctuating workloads and to ensure network redundancy. If one VSAN fails, services can be switched over to another VSAN on the same physical network.
The ever-growing volumes of heterogeneous data bring with them an assortment of performance, maintenance, and security concerns. To help address these issues, vendors have been steadily incorporating intelligence into their storage solutions. Intelligent storage leverages artificial intelligence (AI) and other advanced technologies to proactively manage systems, optimize performance, and address potential issues before they occur.
An intelligent system continuously learns from its environment and automatically adjusts its behavior accordingly. The system collects telemetry data from participating storage systems, aggregates and analyzes the data, and then uses what it learns to maintain and optimize those systems. When effectively implemented, intelligent storage solutions can deliver greater reliability, security, resource utilization, and application performance.
An intelligent storage system relies on a sophisticated analysis engine that leverages AI technologies such as machine learning and deep learning, along with other advanced technologies, including predictive analytics. The engine identifies patterns and anomalies in the data in order to predict problems, forecast trends, identify performance issues, and address other potential issues. At the same time, the engine is continuously learning from the collected data, leading to more accurate predictions and subsequently more efficient storage systems.
An intelligent storage solution can automatically predict outcomes and prevent issues before they occur, while taking steps to optimize workload performance and ensure the data remains secure and in compliance. The solution might alert you to privacy issues, address security threats, help plan capacity, notify you when storage is low, allocate resources to virtualized workloads, or carry out a variety of other operations.
Vendors incorporate intelligence into their storage solutions in different ways. For example, Hewlett-Packard Enterprises (HPE) provides storage intelligence through its InfoSight service, which collects telemetry data every second from millions of sensors on systems implemented across the globe. InfoSight continuously analyzes the data and then applies the results of that analysis to individual customer systems.
Dell EMC takes a different approach by embedding a machine learning engine directly into the storage solutions, making it possible for each system to make rapid decisions autonomously, without relying on continuous external input. The engine analyzes data collected from the local components, using a reinforced learning model to quickly address allocation issues.
However, it doesn’t have the benefit of immediate access to current analytics against a global dataset. That said, Dell storage customers can also take advantage of the vendor’s CloudIQ service, which provides monitoring, analytics, and insights for Dell storage devices. The advantage of the Dell approach is faster response times because it is not waiting for input from an external service.
In a traditional compute/storage architecture, data travels between the storage device and the computer’s memory, where it can be processed in response to application requests. Under normal operations, data moves freely between the two, running into few latency issues and bottlenecks. However, modern workloads such as AI or big data analytics can run into performance issues because the I/O ports that sit between storage and memory have limited bandwidth and can’t keep up with the demand, resulting in bottlenecks that slow response times.
To address this issue, several vendors now offer computational storage solutions that move at least some of the processing to the storage platform itself, an approach sometimes referred to as in-situ processing. Computational storage brings storage and compute resources closer together within the storage layer, where the data can be preprocessed on behalf of the server. Not only does this shorten the data access path and reduce traffic flow—and the latencies that come with it—but the computational components can also take advantage of the parallel processing capabilities inherent in a storage solution, leading to even faster performance.
Computational storage can potentially benefit any latency-sensitive application that processes large quantities of data. It might also benefit edge computing and Internet of Things (IoT) scenarios, where compute resources are often limited by size. For example, you might aggregate a massive dataset in-situ and then send only the aggregated results to the server’s memory for additional processing. In this way, you reduce the amount of data that must pass through the I/O ports, while minimizing the impact on compute resources, which in turn frees them up for other workloads.
Although several vendors now offer computational storage systems, the industry is still very young. It’s not uncommon to run into integration issues because of the differences in implementations. Fortunately, the Storage Networking Industry Association (SNIA) has launched an effort to define interface standards for deploying, provisioning, managing, and securing computational storage devices.
Another modern technology that’s generating a lot of buzz right now is storage-class memory (SCM), a type of memory that’s nearly as fast as dynamic random-access memory (DRAM) but like NAND flash is nonvolatile (that is, it can retain data even if unplugged from power). SCM also comes at a lower per-byte cost than DRAM, while substantially outperforming NAND. It might even deliver greater endurance than NAND.
Like computational storage, SCM is still a young technology, but has a great deal of momentum behind it, with Intel at the forefront. You’ll likely see SCM also referred to as persistent memory, PMEM, or P-MEM. Some sources distinguish between SCM and persistent memory, depending on how the technology is implemented, but such inconsistencies are common in a nascent industry like SCM, and no doubt the industry will eventually settle on a common nomenclature.
Discussions around SCM often center around the idea of a new tier in the memory/storage hierarchy, with SCM modules sitting between DRAM and NAND flash. Like DRAM, the SCM device is byte-addressable and can connect directly to the server’s memory space, providing an effective way to support latency-sensitive applications that need more memory than DRAM can practically provide.
By bridging the gap between traditional memory and storage, SCM makes it possible for applications to access large datasets through the system memory space, resulting in substantially faster read and write operations. At the same time, an SCM device can support block-level access like NAND flash, providing greater versatility than either DRAM or NAND.
Initially, much of the focus on the SCM technology has been on devices that can be used as storage cache or to replace flash solid-state drives (SSDs). Intel led the way in this effort with its line of Optane DC SSDs, which work much like NAND flash SSDs, but deliver greater performance.
More recently, Intel introduced its Optane DC persistent memory modules. These plug directly into standard dual in-line memory module (DIMM) slots. An Optane module can store up to 512 GB of data, far exceeding today’s DRAM, although such capacities might become more common for DRAM in the near future. In this way, the module can serve as a storage tier between DRAM and NAND flash, moving us closer to the original SCM vision.
It’s also possible to use SCM in place of DRAM. Although the SCM modules are slower, their ability to persist data makes them well suited as bootable devices. For example, you might use SCM for a production server that needs to be up and running as quickly as possible after a planned or unplanned restart.
Optane DC persistent memory is based on 3D XPoint technology, which represents a joint effort between Intel and Micron Technology. Micron recently released its first product based on 3D XPoint, the X100 SSD, following the same path as Intel by first introducing an SSD. However, 3D XPoint is not the only SCM effort under way. Other vendors are working on their own solutions based on technologies such as magnetoresistive RAM (MRAM) and nanotube RAM (NRAM).
Moving toward the Future
There is, of course, much more to each of these technologies than what I can cover in a single article, and there are plenty of other emerging technologies as well, such as 5D storage, which uses ultra-fast laser technology to embed data on fused silica glass.
5D storage has the potential to store up to 360 TB of data on a single 12-centimeter silica disk, with the data remaining viable for over 13 billion years. In fact, a disk containing Isaac Asimov’s entire Foundation series is currently orbiting our sun, tucked into Elon Musk’s cherry-red Tesla Roadster, aboard the Falcon Heavy SpaceX rocket launched in February 2018.
Like the orbiting silica disk, storage technologies are in a state of constant improvement, and the future of storage is uncertain yet exciting. What’s certain is that data volume will continue to explode, the data will become increasingly diverse and distributed, and the workloads handling that data will grow more complex and data-intensive, putting greater demands on storage than ever.
The storage technologies of the future will have to accommodate both the amount and complexity of the data, while supporting applications that that are growing more sophisticated and robust every day. The technologies must also ensure that data remains secure and protected against cyberattacks that are also growing more sophisticated and robust. The modern storage solutions emerging now are blazing the trail toward that future but are insufficient to address the myriad of challenges that loom before us, spearheading a new generation of innovative technologies.