The Fast Route from Raw Data to Analysis and Display

There are very few real breakthroughs in IT. Most progress in the industry comes from pure effort and attention to detail. When breakthroughs happen, they are like those famous moments in the construction of Alpine tunnels where two teams, start in separate foreign lands working determinedly for years in opposite directions, apparently digging a bigger hole for themselves, but then suddenly and magically meet in the middle, creating a new faster route.

Probably, in retrospect, the greatest breakthrough moment in IT was the emergence of the HTML browser when markup met TCP/IP. I get that same feeling of breakthrough with R and the relational database. The closer the two can work together, the easier it will be to communicate important things such as world unemployment, population growth, population demographics or global warming; or more prosaically better company business graphics down to the curious, such as the analysis of pop song lyrics.

Over time, the R community has grown beyond its base in data analysis to fix the difficulties of communicating the result. They’ve solved a great deal of the problems of plotting and graphing complex data trends. They’ve also dealt well with making data and graphics available via web and print. The web developments have come from integrating R with a range of javascript packages to bring them within the reach of data people. Rapid print publishing with R and Markdown can be spectacular, particularly if accompanied by a robust design framework.

Database people have come at the problem of communicating trends and facts about data from the opposite direction. The battles have been in ensuring veracity, consistency and legitimacy of data, meaning that the data has to be about the right thing, and as correct as possible. It is a battle that starts at the point of entry with constraints and continues with establishing baselines, checking anomalies, through to creating appropriate aggregations in the right format.

By the dual processes of allowing SQL Server to execute R code and R packages directly, and having easy ways for R to extract data from SQL Server, I believe that we have now joined up the two endeavours in order to provide a much quicker, more direct, route between the data and its communication to the public. The Data Scientist now has a clear path from the raw data to the final presentation of it. To be sure, we’ve been able to do this before but never with such a wide range of choice.

Even if you just need to visualise trends on servers in such a way as to be able to quickly spot potential problems, or to be able to represent data better within applications, I reckon the time has come to take at the close look at the synergy between SQL Server and R.

Commentary Competition

Enjoyed the topic? Have a relevant anecdote? Disagree with the author? Leave your two cents on this post in the comments below, and our favourite response will win a $50 Amazon gift card. The competition closes two weeks from the date of publication, and the winner will be announced in the next Simple Talk newsletter.