A data lake is a scalable data storage repository that can ingest large amounts of raw data and make it available on-demand. Robert Sheldon explains the benefits and challenges of data lakes.
… Read more
Whether you are running an RDBMS, or a Big Data system, it is important to consider your data-partitioning strategy. As the volume of data grows, so it becomes increasingly important to match the way you partition your data to the way it is queried, to allow 'pruning' optimisation. When you have huge imports of data to consider, it can get complicated. Bartosz explains how to get things right; not perfect but wisely.… Read more
It is worth getting familiar with Apache Spark because it a fast and general engine for large-scale data processing and you can use you existing SQL skills to get going with analysis of the type and volume of semi-structured data that would be awkward for a relational database. With an IDE such as Databricks you can very quickly get hands-on experience with an interesting technology.… Read more
What is wrong with the Enterprise Data Warehouse? Quite a lot, it seems. By taking the narrow view that the struggle is that of accommodating and interrogating huge quantities of data, then initiatives such as the Virtual Data Warehouse and Logical Data Warehouse could make sense. But what about data quality, security, access control, archiving, retention, privacy and regulatory compliance?… Read more
Was the marketing hook 'The Internet of Things' conjured up before the technical definition? Are we being persuaded to spend money on fending off yet another fantasy tsunami of data? Already, we have televisions that listen to, and report, your conversations; so are we facing the Science Fiction future of gadgets that report where you go, who you visit and what medications you take? As Robert Sheldon says; "It's big, almost too big to get your arms around"… Read more