Data as a Service: The Next "As a Service" Wave?

There was a time that data seemed part of the application that maintained and used it. Now, there is increasing demand to deliver data through platform-agnostic open-standard APIs so it can be consumed in a variety of ways, whether refined, aggregated, or combined with additional information. Are we heading towards a shared understanding of applications as data-providers, feeding other services such as BI, or even in the right circumstances, publishing it?

I realize I’m preaching to the choir, but I’ll continue anyway. It’s all about the data. Period. We build solutions to store data and present data and analyze data and process data in any number of ways. No matter how slick we make the interfaces or rich we make the features, somewhere lurking in the background is the data that drives our systems and brings value to our applications.

And that data is becoming bigger and more complex all the time, along with the rules and policies that govern its use. So it was just a matter of time before we saw the emergence of yet another “as a service” offering, this time, data as a service, or as the current vernacular would have it, DaaS-not to be confused with desktop as a service or database as a service or big data as a service, all of which have adopted the DaaS moniker at one time or another, whether appropriately or not.

Make Way for the New Kid on the Block

In many ways, DaaS seems the inevitable outcome of our data evolution. We started by sticking data in self-contained repositories, such as relational databases or file folders, and built software to access and present the data in formats users could understand. This led to a coupling between the repository and the application that often resulted in the two being inextricably entwined, leading to data silos throughout the enterprise that stood independently of one another, with no easy way to link them together.

As the number of silos grew and their sizes and variety increased, enterprises sought ways to have these applications talk to one another, which introduced an intricate interface layer that morphed into the collective technology known as Enterprise Application Integration (EAI). Unfortunately, EAI tools added to the complexity of managing these systems and tended to result in vendor lock-in in order to make all the components play nice with each other. To make matters worse, EAI often failed to improve data utilization to any significant degree, despite the hefty price tag that came with it.

While all this was going on, data continued to grow and become more complex, in no small measure due to the Internet. Information proliferated both inside and out of the enterprise, adding mountains of unstructured data to the mix and turning the relational world on its normalized head. Not only has this led to a crisis in data quality, but also one in data intelligence. Trying to make sense of this mass of internal enterprise data is one thing. Adding web cookies and mobile identifiers and social networking claptrap is quite another, especially while trying to negotiate the entangled worlds of privacy, security, and compliance.

At the same time, we’ve seen the fruition of service-oriented architecture (SOA), which makes it possible to decouple the data from the application, and we’ve witnessed significant advancements in virtualization and integration technologies, including open-standard protocols that streamline data transfers and communications. Storage and processing technologies have also made great strides in recent years.

The proliferation of data and advancements in technology-along with the growing popularity of cloud solutions-have resulted in a shift in thinking, away from the traditional data/application pairing to a model that renders the data platform irrelevant. At the same time, decision-makers have started to come to terms with the fact that data really is the kingpin that holds all the other structures together and should be accorded the respect it deserves, helping to further herald the age of DaaS.

Indeed, DaaS seems the logical next step in the evolution of “as a service” computing. Early forms of DaaS have already made their mark. Web mashups are often cited as some of the earliest DaaS implementers; however, we can also find the beginning stages of DaaS in syndication feeds (such as blog posts and news feeds) that delivered dynamically changing document collections in a platform-agnostic format. Governments, financial services, and the telecom industry also arrived early at the DaaS table and have helped to define the industry’s direction.

DaaS lets us decouple the data from the application in order to meet the challenges of the increasingly diverse datasets that are filling the airwaves, while taking advantage of the flexibility and efficiency of the SOA model-all without being tied to a specific application or platform. In this way, data can be delivered to the people who need it in a format that can be consumed immediately. The analyses that once took weeks to perform-or even months-can now be achieved in minutes with real-time or near real-time data.

The Way of DaaS

In a coup perhaps long overdue, DaaS separates data from its associated applications to deliver meaningful information to users regardless of location or platform. Any application using the appropriate protocols can access the data in order to deliver it in a format best suited to the users who need it the most.

DaaS brings together the technologies necessary to retrieve data from heterogeneous sources such as transactional databases, data warehouses, enterprise resource planning (ERP) systems, and customer relationship management (CRM) solutions. The data is fed into a central location where it is aggregated, cleansed, standardized, and enriched and then offered as an on-demand service to the DaaS subscribers. Part of the DaaS dictum is to deliver relevant data in a timely and secure manner at an affordable cost, if not free, through either public or private cloud platforms based on open standards.

Organizations that turn to DaaS in the public cloud get to transfer at least some of the complexities of data management to the providers offering the service. In this way, business users can focus on the data itself in order to analyze and uncover value without having to be concerned about the underlying technologies or to become experts in their use. As the amount and variety of data continues to grow, so will the need to offload more of that burden.

Even organizations that implement DaaS in a private cloud get the benefit of a centralized platform and the ability to deliver the data to the business users who need it in a format they can use, regardless of the applications they’re working with or their knowledge of the source data.

In a typical DaaS implementation, the data is delivered in blocks that include the data itself as well as metadata that describes the data, such as you might find in an XML document. To facilitate the data transfers, most services provide a set of APIs that can vary in terms of supported protocols (e.g., REST, SOAP, or HTTP) and data formats (e.g., XML, JSON, or RSS). The APIs make it possible for a variety of applications to consume exactly the data they need, when they need it, and deliver it to the appropriate business users.

A DaaS solution abstracts the underlying silos in order to deliver data in a platform-agnostic format. The calling application is concerned only with the protocols and formats used to retrieve the data, not with the data sources themselves, allowing information workers to take control of complex information and focus on business outcomes, rather than having to master new technologies.

There are no shortages of ways organizations can use the data once it’s made available through a DaaS platform. DaaS gives them easy access to current information relevant to their specific industries. Marketing and sales, for example, can better understand their buyers and potential buyers by leveraging outside data from social networks and the Web. They can also use the data to better understand current trends and reactions to products in order to provide better customer service.

The health care industry can also benefit from DaaS by being able to share and aggregate patient data, without compromising confidentiality. DaaS gives the medical community better tools to monitor the health of a population and identify those at high risk of particular diseases. Medical facilities can also better monitor the effectiveness of treatment and care, and doctors can use DaaS for screening and diagnostic purposes.

There is no limit to the types of data that can be offered through a DaaS platform. In addition to those already mentioned, DaaS can deliver census, financial, geographic, insurance, retail, supplier, and distributor data, as well as data from a number of other industries. And once the data has been consolidated and cleansed, a variety of applications (stand-alone, web-based, or mobile) can access the data at any time and from anywhere there’s an Internet or network connection. Imagine the analysts who will be able to generate compliance and business intelligence (BI) reports within minutes rather than hours, weeks, or months.

Not So Fast, DaaS

Clearly, the benefits of DaaS are many. Data can be consolidated, cleansed, abstracted, and presented to any number of applications through a set of APIs based on open standards. Not only does this allow subscribers to easily and quickly access data integrated from multiple sources, but it also provides a scalable single-point solution for controlling quality, maintaining security, and providing one version of the truth. Users can access complex information whenever they need it without having to know where the data originates or how it’s structured.

As good as this all sounds, however, DaaS is not without its challenges. It’s no small task integrating data from disparate sources, de-duplicating and cleansing the data, adding value to the data, and presenting the data as an open SOA service. Data might come from relational databases, Hadoop clusters, data warehouses, social media sites, news feeds, mobile devices, or any number of heterogeneous sources. Regardless of the variety of sources, however, the data must be presented as a unified, comprehensive resource that is reliable in terms of accuracy and availability-and the more data grows in size and complexity, the harder the job becomes.

DaaS proponents are also tasked with educating potential subscribers about the benefits of data as a service over traditional app/storage solutions that have a proven track record. In some cases, DaaS advocates will have to convince CIOs and other decision-makers to let go of perspectives they’ve held for years and embrace the notion of service-oriented data decoupled from the applications. In a world where data ownership can equate to political power, the idea of a shared-resources model could easily scare some of them away.

For many, the DaaS model also raises concerns about privacy. Sharing data outside of departmental or organizational walls is a step many are not ready to take. The growing concerns about privacy in this post-Snowden era only fuel the debate, as does the fact that DaaS is still an evolving industry yet to gain a solid foothold.

Then there’s the issue of compliance. Legalities and regulations determine exactly what can and cannot be done with certain data, and these rules vary significantly from one border to the next. Many organizations don’t want to go near DaaS unless they can keep the data entirely in-house.

Even organizations willing to embrace DaaS should be wary of hosting their data in the cloud. In this sense, DaaS is like any cloud service and is subject to the same security risks. And risks there are. The news is full of horror stories about big-time cloud service providers having their data or systems compromised. Look at what happened recently to Dropbox, Snapchat, and Apple’s iCloud, to name a few. Then, of course, there are all the breaches we haven’t heard about.

With DaaS, as with any cloud service, customers also run the risk that the data is not going to be there when they need it. Even if they use DaaS only to access third-party data (and do not keep their own confidential data out there), systems get hacked, ISPs falter, providers go out of business, catastrophes occur. Mission-critical systems that rely on external data face the risk of a single point of failure that can bring an entire operation to a halt. If a DaaS provider fails to deliver for whatever reason, an organization can face irreparable damage.

The Wonderful World of DaaS

Although DaaS has been around for a while, in one form or another, in many ways it is still a technology in its infancy, with various perceptions of what DaaS is and exactly what services should be delivered, in terms of data quality and completeness. This is, in part, because DaaS is still evolving and a definitive standard has yet to emerge, so it’s not surprising that when different people talk about DaaS, they’re not necessarily talking about the same thing.

Currently, DaaS appears to be dividing into two camps. On one side, we have the data broker model in which external data is collected and massaged and offered just like any other service, sometimes free, sometimes not. In the other camp, we have the master data management (MDM) model in which DaaS aggregates data from in-house sources and provides data services for internal systems.

DaaS that falls into the data broker camp is the more popular of the two and can itself take different forms. For each form, those who participate in the service can act as provider, consumer, or both (which only adds to the confusion of what DaaS is). Regardless of who does what, DaaS platforms strive for the same goal: to offer data as an open-standards service to those who want to incorporate that data into their applications.

One area that’s quickly gaining steam is the commercial provider that sells data services to its subscribers. For example, Xignite collects financial data from a wide range of sources and delivers it through an extensive collection of APIs. Each API lets consumers access the data in a unique way and often supports multiple data formats, including XML, JSON, and CSV, allowing subscribers to incorporate the data in their websites, mobile apps, or on-site systems.

Oracle has been particularly aggressive in the data broker market. Oracle’s new DaaS for Business collects user profile and social media data from around the world and then cleanses the data by taking such actions as de-duplication, signal extraction, cross-channel matching, and data verification. Oracle’s entry into the DaaS market arrived several months after acquiring BlueKai, the provider behind the Audience Data Marketplace, touted as the world’s largest third-party data market, with audience data on over 300 million users.

Another form of the data broker model is the organization that sells data that it’s collected for other purposes. Companies such as ad agencies or credit bureaus are already gathering information from multiple sources and cleansing it for their own internal needs. Why not turn an additional profit by selling some of that information? For example, Experian tracks credit-related data on millions of consumer profiles. The company now has a marketing services division that offers some of that data as a service. Subscribers can gain insights into consumer attitudes and behaviors or learn details about profile demographics.

But DaaS isn’t only about making money. As governments hop aboard the Open Data movement, many plan to implement or have already implemented a DaaS platform as a way to collect and distribute public information. Singapore, for example, has launched a DaaS pilot program as part of their “smart nation” strategy. Catalonia, Spain is pursuing a similar program. The LEADS research project, which is funded by the European Commission, is proposing that a DaaS platform be implemented to enable smaller organizations to take advantage of the vast amounts of public data. The UK and US are considering similar strategies for disseminating public information.

In many cases, these projects are based on the premise that participating organizations act as data producers and consumers. In addition to being able to take advantage of DaaS to enrich their own environments, these organizations also feed data into the platform, providing a way for participants to effectively and safely share data among all participants. This model can also extend to other disciplines, such as health care or scientific research, where everyone can benefit from the two-way flow of information in an easily consumable format.

In the other DaaS camp, we have a somewhat different approach, that is, DaaS as a type of MDM solution. In this case, organizations implement an internal DaaS platform to collect their own data and serve it up to their own users-whether internal staff, partners, or customers-making data from across the enterprise available in a controlled manner to those who need it, in a format that can be easily consumed. In this case, the DaaS solution often resides in a private cloud. However, just like other forms of DaaS, it still provides the mechanisms necessary to collect, cleanse, protect, and deliver data based on open standards that can be used by a variety of applications.

It’s All About the Data

Data is only going to continue to grow in size and complexity. At the same time, the technologies that make DaaS possible will undergo further refinement and become more comprehensive. Decoupling data from the application might not work in all cases, but in many cases it will, making it possible to deliver data through platform-agnostic APIs so it can be consumed in a variety of ways, whether the data is used as is, further refined, or combined with additional information, allowing analysts and other business users to focus on the data itself and not how it got there.

Certainly there are challenges with DaaS, as there are when delivering any service, but the greater challenge might be to convince decision-makers to let go of how they’ve been viewing data long enough to at least consider a new approach. There are many issues to take into account, of course-privacy, security, and ROI, to name a few-but under the right circumstances, when implemented effectively, DaaS can prove a valuable strategy in this era of mobility and information overload, where data is only going to get bigger and crazier and even more out of control.

In the end, it really is all about the data.