Over the years Power BI has evolved into a complex and varied ecosystem of tools and solutions, which in its turn demands several supporting roles: there are, of course, developers, data engineers and data scientists, but there is need for one more, i.e. a capacity administrator. Of course some of these roles may be covered … Read more
Containerization has removed boundaries that limit developers from working on one application using different systems. Thus, boosting developer collaboration and speeding the application deployment process. Containerization involves bundling and packaging applications into containers that have all the necessary dependencies and tools for compiling an application on any operating system. Containers enable the coexistence of legacy … Read more
Over the past years, “traditional” ETL development has morphed into data engineering, which has a more disciplined software engineering approach. One of the benefits of having a more code-based approach in data pipelines is that it has become easier to build metadata driven pipelines. What does this mean exactly? Say for example you need to … Read more
The presentation layer of a headless CMS is separated from the content management system itself, making it a backend-only system for managing, creating, and storing material. Content presentation (how the content is shown on websites or applications) and content creation are handled by the content management system in a standard CMS. Headless CMSes have evolved … Read more
There was a time, when I was in a team that was designing an important IT system for a multinational bank, the testers arranged for perfectly normal office workers from the bank to try the system out. This was long before the days of instant video. The software team watched from behind a two-way mirror. … Read more
In the previous articles this series, I demonstrated various ways to retrieve document data from a MongoDB database, using both MongoDB Shell and MongoDB Compass. In this article, my focus shifts from retrieving data to updating data, which is an essential skill to have when working with MongoDB. Whether you access the data directly in … Read more
Before I started as the editor of Simple Talk, I worked on SQL Server. Only. (Ok, I used Redgate’s tools too). But when I started here, one of the goals was to stretch the topics farther and farther into more and more data platforms. And it is not just me in my niche job that … Read more
There are many packages and tools that you can use to facilitate your API development with Rust. Rust has a rich third-party ecosystem of crates for building APIs, including web packages like Actix and Rocket and ORMs like Diesel and SeaORM. This article delves into using Actix and Diesel to build web applications. You’ll learn … Read more
IAsyncEnumerable is a powerful interface introduced in C# 8.0 that allows you to work with sequences of data asynchronously. It is a great fit for building ETLs that asynchronously stream data to get it ready for transfer. You can think of IAsyncEnumerable as the asynchronous counterpart of IEnumerable because both interfaces allow you to easily … Read more
The first two articles in this series demonstrated how PostgreSQL is a capable tool for ELT – taking raw input and transforming it into usable data for querying and analyzing. We used sample data from the Advent of Code 2023 to demonstrate some of the ELT techniques in PostgreSQL. In the first article, we discussed … Read more
In the first part of this two-part series, I covered the mostly non-technical aspects of building a data culture. While the lion’s share of the work will be getting people to work together and embrace ever deeper use of data, as a reader of Simple-Talk, a lot of this transition will be technical. In this … Read more
Let’s start by defining a subset and why you would require a data subset? When dealing with the development, testing and releasing of new versions of an existing production database, developers like to use their existing production data. In doing so, the development team will be hit with the difficulties of managing and accommodating the … Read more
In my previous post, I showed how to borrow a snake draft concept from fantasy football, or a packing technique from the shipping industry, to distribute different portions of a workload to run in parallel. In the previous example, we determined a distribution order for databases based on size – though you can rank by … Read more
I recently had a restore job where I needed to split the work up into multiple parallel processes (which I’ll refer to here as “threads”). I wanted to balance the work so that the duration was something significantly less than the sum of the restore times. Imagine a job that loops through and restores each … Read more
Finally, mirroring is available for Fabric! You can mirror an Azure SQL to Fabric. It works for CosmoDB and Snowflake as well, but in this article, I will focus on Azure SQL. It is 100%, no, but it is definitely a feature that is really great even now. Before getting into a step-by-step of the … Read more
Rust is emerging as a frontrunner for ensuring memory safety without sacrificing performance. Its growing popularity isn’t solely based on the “fearless concurrency” mantra but also on its expanding ecosystem that fosters integration with various technologies. A domain Rust proves to be formidable is database interaction, and a pivotal player in this realm is the … Read more
In the first article in this transforming data series, I discussed how powerful PostgreSQL can be in ingesting and transforming data for analysis. Over the last few decades, this was traditionally done with a methodology called Extract-Transform-Load (ETL) which usually requires external tools. The goal of ETL is to do the transformation work outside of … Read more
As a data professional, there is a set of tools that you use on pretty much a daily basis. Before I started as the editor of Simple-Talk, there were two Microsoft tools I used every day of the work week, and also for my hobby work: SSMS (SQL Server Management Studio) and SSDT (SQL Server … Read more
One of the major trends in enterprise computing, and really in enterprises themselves is an increased emphasis on data. My career has always revolved around data, but this is a new focus for many parts of the organization. Even business units that traditionally don’t care about data realize that access to more, and better, data … Read more
In the previous article in this series, I demonstrated how to build and run an aggregate statement in MongoDB Shell. An aggregate statement makes it possible to group and summarize a collection’s document data, as well as transform the data and control its output. For the examples in that article, I used the version of … Read more