The first two articles in this series demonstrated how PostgreSQL is a capable tool for ELT – taking raw input and transforming it into usable data for querying and analyzing. We used sample data from the Advent of Code 2023 to demonstrate some of the ELT techniques in PostgreSQL. In the first article, we discussed … Read more
Let’s start by defining a subset and why you would require a data subset? When dealing with the development, testing and releasing of new versions of an existing production database, developers like to use their existing production data. In doing so, the development team will be hit with the difficulties of managing and accommodating the … Read more
In my previous post, I showed how to borrow a snake draft concept from fantasy football, or a packing technique from the shipping industry, to distribute different portions of a workload to run in parallel. In the previous example, we determined a distribution order for databases based on size – though you can rank by … Read more
I recently had a restore job where I needed to split the work up into multiple parallel processes (which I’ll refer to here as “threads”). I wanted to balance the work so that the duration was something significantly less than the sum of the restore times. Imagine a job that loops through and restores each … Read more
Finally, mirroring is available for Fabric! You can mirror an Azure SQL to Fabric. It works for CosmoDB and Snowflake as well, but in this article, I will focus on Azure SQL. It is 100%, no, but it is definitely a feature that is really great even now. Before getting into a step-by-step of the … Read more
In the first article in this transforming data series, I discussed how powerful PostgreSQL can be in ingesting and transforming data for analysis. Over the last few decades, this was traditionally done with a methodology called Extract-Transform-Load (ETL) which usually requires external tools. The goal of ETL is to do the transformation work outside of … Read more
As a data professional, there is a set of tools that you use on pretty much a daily basis. Before I started as the editor of Simple-Talk, there were two Microsoft tools I used every day of the work week, and also for my hobby work: SSMS (SQL Server Management Studio) and SSDT (SQL Server … Read more
In the previous article in this series, I demonstrated how to build and run an aggregate statement in MongoDB Shell. An aggregate statement makes it possible to group and summarize a collection’s document data, as well as transform the data and control its output. For the examples in that article, I used the version of … Read more
In this article, I will discuss the history and thinking behind several types of logic that are typically associated with writing relational database code. Because of how relational databases implement NULLs, it definitely can make some processes more complex than expected by may programmers (inexperienced and experienced alike!) Three Valued Logic Missing or UNKNOWN values … Read more
Pagination is a technique for limiting output. Think of Google search results, shopping the electronics category on Amazon, or browsing tagged questions on Stack Overflow. Nobody could consume all of the results in a single shot, and no site wants to spend the resources required to present them all to us, so we are offered … Read more
Ever wonder all the reasons that we use databases instead of file systems? While we don’t think of it too much anymore, the first reason that databases came into existence was to remove redundancies. The first source of redundancy back in the dark ages, when I was just beginning the program, was a product called … Read more
In our data hungry world, knowing how to effectively load and transform data from various sources is a highly valued skill. Over the last couple of years, I’ve learned how useful many of the data manipulation functions in PostgreSQL can supercharge your data transformation and analysis process, using just PostgreSQL and SQL. For the last … Read more
If you haven’t migrated your workloads to a managed database platform yet, you’re probably still relying on SQL Server Agent for various maintenance and other scheduled tasks. Most of the time, these processes just work. But when it’s time to troubleshoot, it can be cumbersome to get to the root of some problems. In this … Read more
In the previous article in this series, I discussed how to use a find statement to retrieve data from a MongoDB collection. However, the find method is not the only option for retrieving document data. Another important method is aggregate, which lets you group documents, perform calculations on those groups, and in other ways extract … Read more
String manipulation is an inevitable task for developers and data professionals alike. Despite all the best efforts to normalize databases, eventually we are faced with some sort of text-based data stored within a relational database and need to extract detailed information from it. Those of us who have tackled these challenges fully understand how code … Read more
Lately, I’ve seen many people struggling to upgrade their workstation to the latest version of SQL Server. The main source of the problem is usually the web installer/wrapper, which can fail for a variety of reasons that aren’t always made clear by the error messages. Today, I’m going to walk through one way you can … Read more
One of the most useful constructs in SQL Server is the stored procedure. It gives you a way to do several things. First up, you can store code within the database. Next, you can parameterize queries so that you’re not hard coding or generating ad hoc queries every time you want to call them. You … Read more
This is for the folks who still have to log into remote machines and do work manually on the box. Yes, we still exist, and we will for as long as we’re still using physical servers in data centers and even IaaS. Not everyone has transitioned to server core and full-on PowerShell remoting for everything. … Read more
Nikola Ilic, best known as Data Mozart, published a great article and video about how to make semantic model data available in Microsoft Fabric. This allows the data to be used in lakehouses or data warehouses. One major question that arises is, “should we use a top-down or bottom-up (or both) approach in Microsoft Fabric?” … Read more
Every year, Redgate surveys technologists to ask a big question (through lots of little questions, naturally.) This year’s question was about their current data platform configuration and usage. Just before it was released, I read the results, and I have to say, some of the things I learned amazed me…until I thought a bit more … Read more