The 2002 movie Minority Report is about a police unit called PreCrime, which can predict when people will commit a crime so they can be arrested before it happens. Things go awry when a team member played by Tom Cruise is himself “pre-accused.” To make predictions, they use technology and the special abilities of the … Read more
Machine learning projects often stall when it's time to deploy. Shree Das introduces Kubeflow for data scientists, an end-to-end solution for ML projects.… Read more
Distributed File Databases manage large amounts of unstructured or semi-structured data. They are designed on the principle of splitting up the data into multiple locations, and then placing the code that processes each fragment close, or directly on, that location. Buck Woody shows how to install Hadoop in your Data Science lab to experiment with an example of the breed.… Read more
Though the Key/Value pair paradigm is common to almost every computer language, there is no clear agreement yet for the definition of a Key/Value Pair database. However, Key/Value pair databases are valuable for special applications where speed of writing data is more important than searching and general versatility. It is certainly worth experimenting with in a data science lab.… Read more
There is no better way of understanding new data processing, retrieval, analysis or visualising techniques than actually trying things out in a lab system. Buck Woody continues his series by explaining why an RDBMS is essential for a lab, what that is, and how to install SQL Server into the lab. … Read more
Although every computer language is suitable for data, some languages lend themselves especially well for working with certain types or sources of data, or processing the data in certain ways, and so are of particular use to the data scientist. … Read more
Data tools interact directly with data and are great for automating data data-aquisition, but they aren't always the best way to prototype or pilot a process. Interactive data tools also allow you to test and refine the process, until it is ripe for automation. … Read more
It is sensible to check the performance of different solutions to data analysis in 'lab' conditions. Measurement by instrumentation makes it easier to develop systems that are efficient.… Read more
Anyone who is frequently faced with preparing data for processing needs to be familiar with some industry-standard text-manipulation tools. Awk, join, sed, find, grep and cat are the classics, and Buck Woody takes them for a spin in his Data Science Laboratory… Read more