Whether you are running an RDBMS, or a Big Data system, it is important to consider your data-partitioning strategy. As the volume of data grows, so it becomes increasingly important to match the way you partition your data to the way it is queried, to allow 'pruning' optimisation. When you have huge imports of data to consider, it can get complicated. Bartosz explains how to get things right; not perfect but wisely.… Read more
It is worth getting familiar with Apache Spark because it a fast and general engine for large-scale data processing and you can use you existing SQL skills to get going with analysis of the type and volume of semi-structured data that would be awkward for a relational database. With an IDE such as Databricks you can very quickly get hands-on experience with an interesting technology.… Read more
What is wrong with the Enterprise Data Warehouse? Quite a lot, it seems. By taking the narrow view that the struggle is that of accommodating and interrogating huge quantities of data, then initiatives such as the Virtual Data Warehouse and Logical Data Warehouse could make sense. But what about data quality, security, access control, archiving, retention, privacy and regulatory compliance?… Read more
A Pragmatic Introduction to Building IoT Solutions on the Azure Platform by Rick Garibay. Rick starts his series with an introduction to the Internet of Things.… Read more
A Pragmatic Introduction to Building IoT Solutions on the Azure Platform by Rick Garibay. Rick continues his series covering the topic of ingesting data.… Read more
Was the marketing hook 'The Internet of Things' conjured up before the technical definition? Are we being persuaded to spend money on fending off yet another fantasy tsunami of data? Already, we have televisions that listen to, and report, your conversations; so are we facing the Science Fiction future of gadgets that report where you go, who you visit and what medications you take? As Robert Sheldon says; "It's big, almost too big to get your arms around"… Read more
For once, business people are excited about the importance of data and are interested in the business benefits of extracting insights from the data. Perhaps this is more of a cultural rather a technical initiative, and so we in the data industry should participate by redefining the concept of the three Vs to reflect business values rather than technical challenges.… Read more
Hadoop and MapReduce have good prospects for adoption as a standard for big data analysis, especially since its adoption by Microsoft. It is ideal for Cloud usage since one can spin up nodes when required, pay only for storage and compute services whilst they are running. Roger Jennings describes how to get it running on Azure.… Read more
If you are seeking to analyse very large sets of data, and need a highly parallel rapid way of doing it that scales to your requirements, then 'Cloud Numerics' from Microsoft may be the answer to your prayers… Read more
Microsoft's OData, the Open Data Protocol, allows consumers of data to get the metadata of data sources, to perform queries and, if necessary, to create, change or delete data items. It now provides the API for many of the available services on the internet, including the Open Government Data Initiativ… Read more