It’s a strange time, right now, to be a DBA. 44 years after E F Codd introduced the term ‘relational database’ in his seminal white paper, A Relational Model of Data for Large Shared Data Banks, a revolution is looming on the horizon once again.
At the time, Codd’s thinking streamlined the way companies thought about data banks. If they became relational databases, information could be retrieved from them faster and easier. Rather than just being a place to store data, they could become the place to analyze data. The first commercial relational database appeared nine years later and, since then, DBAs have been the brains behind the data. If there’s a question, ask the DBA. If there’s a problem, call the DBA. If there’s a compliment due, praise the DBA.
And it’s worked. Daily monitoring and maintenance by DBAs has supported the availability of databases. Performance tuning and recovery planning have ensured that availability continues with as few problems as possible.
Times have changed.
The world’s moving faster
Shipping updates to databases is an essential part of any DBA’s job, but the frequency of those updates is increasing. Rather than now and again, updates are now being requested a lot faster. The faster changes can be shipped, the quicker new features can be released to ever more demanding customers. (Facebook typically ships updates twice a day.)
Alongside more demanding customers, Agile thinking has made developers more demanding. The Agile Manifesto emerged in 2001 and reversed common development thinking. Individuals and interactions became more important than processes and tools. Working software was preferred over comprehensive documentation. Responding to change was the calling cry, not rigidly following a plan. The consequence? Agile developers see DBAs as a bottleneck in the process. While DBAs have a duty to be the guardian of the database, developers chomp at the bit to get their releases out.
Data is getting bigger
When Codd was busy writing his white paper, mainframes were still around and IT people talked in terms of kilobytes of data. When relational databases emerged at the end of the 1970s, megabytes of data were being planned for. The launch of the internet in the 1990s made us start thinking in gigabytes. Those gigabytes soon grew to terabytes. Now we’re talking zettabytes.
We’re all familiar with a gigabyte. A single zettabyte is 1000,000,000,000 gigabytes. And a variety of forecasters predict that the exponential rise in data growth will result in the need to store 44 zettabytes of data worldwide by 2020. That’s compared to around 4.4 zettabytes today. A ten-fold increase.
It’s not just the size of the data that is going to cause an issue. It’s the form of the data. Alongside the structured data we currently work with, there will be Big Data – unstructured, semi-structured, and multi-structured data. All kinds of data from a multitude of sources that needs to be processed, stored, and analyzed.
The Internet of Things is arriving
One of the sources of unstructured and many-structured big data will be the Internet of Things. Lots and lots of things out there, collecting data and sharing data with lots of other things, using the internet as the communications channel.
The first building blocks of the Internet of Things are already out there. Apple is offering its HomeKit, a suite of tools for controlling devices in the home. Google has invested in the home automation system, Nest. And utilities are jumping in to claim the space for themselves, promising the ability to improve healthcare, communication and entertainment, and security.
And all of this is before business and industrial applications take off, adding sensing, monitoring, and remote management to everything from cars to streetlights to the heating systems within large commercial buildings.
Open Source is opening new doors
Relational databases still dominate the market, but new kinds of databases are emerging to handle the multiple data types that are now being created. They’re mostly open source, and they’ve been developed to handle large-scale processing of data-sets across clusters of commodity hardware.
Hadoop, for example (named after the toy elephant of one of the founder’s sons), is gaining more and more traction as the platform that makes big data easier to manage. It’s not relational, it stores files and processes data in a completely different way to SQL Servers, and it’s not alone.
MongoDB promises to be more agile and scalable. Couchbase boasts that it provides the world’s most complete, most scalable and best performing NoSQL database. Alongside them, companies like Amazon, Google, Facebook, and LinkedIn all run on proprietary NoSQL databases.
Interestingly, a lot of this non-relational activity appears to be planned far away from the DBA. In its 2014 State of Database Technology Report, InformationWeek found that the NoSQL distributed database platform, Riak, was used by just 1% of respondents to its survey. Yet more than one third of the Fortune 100 employ the technology, supporting InformationWeek’s conclusion that non-relational databases are not replacing relational databases. Instead, they’re augmenting them to offer companies new business analytics capabilities that DBAs just aren’t involved with.
The age of one company, one database, is over
For years – decades, even – companies and organizations have traditionally chosen one kind of database and stayed with it. Be it Microsoft SQL Server or Oracle, MySQL or IBM DB2, there was only room for one budget and one database.
Virtualization helped, with its ability to run different databases – and many databases – on the same system, but the general truism remained. There was one overall system with the DBA in the center of everything.
That’s going to change and in many ways has to change to accommodate the massive rise in data and types of data that’s on the way.
Alongside the traditional relational databases, non-relational databases are going to become the norm in many companies and organizations. Each with an important role to play in the business.
Big Data is just … data
Barely has the term Big Data become accepted and already analysts are out there saying that Big Data is just, well, data. A lot of it, but data all the same that has to be collected, processed and analyzed.
The thing is that much of the data isn’t the structured data we’re comfortable with in the relational database world. It will be multi-structured data. Massive piles of the stuff that has to be processed as fast as possible because there’s another massive pile that will arrive tomorrow morning.
Platforms like Hadoop are perfect for processing it into a meaningful form. But guess what? Once it’s processed, relational databases will be waiting in the wings to analyze it.
So make friends with Hadoop, MongoDB, CouchBase, Riak, etc. They’re not the enemies waiting to steal your pay check. They’re the allies that will make sure your pay check comes in on time.
It’s time to think a little differently
In the past, applications have come along and they’ve been updated annually or every six months. There has been time to follow a natural process from testing through to staging through to production. Today’s applications are different. They typically demand high availability, a high volume of writes, geographic distribution and lots of upgrades.
All of which means that the methodical release process of yesterday has to be replaced with a faster, more efficient process. One that borrows some of the Agile thinking and starts to use methods like version control, continuous integration, automated deployment. That way, rather than being the bottleneck that slows developers down, DBAs become the enablers that help developer work faster – and protect the data at the same time.
There’s more than enough room for everyone
Remember that estimate we saw before, of the forecast need to process, analyze and store 44 zettabytes of data by 2020? A lot of that will be whisked away to non-relational databases which are better able to handle unstructured and multi-structured data. But a large part of it will be heading in your direction as well. If it’s structured data, relational databases will remain the natural home for it.
The only difference is that in the near future – and it’s not far away – you’ll be sitting alongside the people doing the non-relational stuff. Make friends. It’s going to be an interesting journey.