01 June 2017
01 June 2017

Statistics in SQL: The Kruskal–Wallis Test

Before you report your conclusions about your data, have you checked whether your 'actionable' figures occurred by chance? The Kruskal-Wallis test is a safe way of determining whether samples come from the same population, because it is simple and doesn't rely on a normal distribution in the population. This allows you a measure of confidence that your results are 'significant'. Phil Factor explains how to do it.… Read more
21 March 2014
21 March 2014

Data Science Laboratory System – Distributed File Databases

Distributed File Databases manage large amounts of unstructured or semi-structured data. They are designed on the principle of splitting up the data into multiple locations, and then placing the code that processes each fragment close, or directly on, that location. Buck Woody shows how to install Hadoop in your Data Science lab to experiment with an example of the breed.… Read more
31 January 2014
31 January 2014

Data Science Laboratory System – Object-Oriented Databases

Object-Oriented Databases (OOD) avoid the object-relational impedence mismatch altogether by tightly integrating into the user-level OOP code to the extent that they are simply an engine that ships with the code itself. The developer is able to instantiate OOD objects directly into the code. Buck Woody explores the Object-Oriented breed of database in his Data Science lab.… Read more
04 December 2013
04 December 2013

Data Science Laboratory System – Graph Databases

Graph database are an intriguing alternative to the relational model. They apply graph theory to record the relationships between entries more naturally, and are a good fit for a range of data tasks that are difficult in SQL. Buck Woody gives an introduction to Graph databases and shows how to get Neo4J up and running to get familiar with the technology.… Read more
07 October 2013
07 October 2013

Data Science Laboratory System – Document Store Databases

A Document Store Database (DSD) is similar to a Relational Database Management system with the exceptions that a DSD allows for unstructured data and sharding a single database across multiple machines. So when or why would you choose a document database over a relational one? Buck Woody has the answer and an example using the DSD MongoDB on his lab system.… Read more
17 July 2013
17 July 2013

Data Science Laboratory System – Key/Value Pair Systems

Though the Key/Value pair paradigm is common to almost every computer language, there is no clear agreement yet for the definition of a Key/Value Pair database. However, Key/Value pair databases are valuable for special applications where speed of writing data is more important than searching and general versatility. It is certainly worth experimenting with in a data science lab.… Read more