Missing Data

In the real world of business or scientific reporting and analysis, data can prove to be awkward. It can be plain wrong or it can be altogether missing. Sure, we have the NULL to signify unknown, but that doesn't play well with regular business reporting. There are a number of ways of dealing with missing information, and methods of estimating data from existing data has a long and respectable history. Joe Celko gets to grips with a data topic that is often treated with some trepidation. … Read more

Simple SQL: Random Thoughts

How does one get a truly random sample of data of a certain size from a SQL Server database table. Well, there are simple non-portable tricks one can use, such as the NewID() function, but then refining those can be tricky. Take the Rand() function for a start. Can it really provide you with a truly random number? Why doesn't the TABLESAMPLE clause give you a set number of rows? Joe Celko scratches his head a bit, explains some of the issues and invites some suggestions and tricks from readers.… Read more

SQL Data Aggregation Aggravation

When we have to deal with and store a lot of data, it makes sense to aggregate it so that we store only the information we actually need. If we get this right, this works well, but the design of the system takes care and thought because the problems can be subtle and various. Joe Celko describes some of the ways that things can go wrong and end up providing incorrect, inaccurate or misleading results.… Read more

Simple SQL: Attribute Splitting

If the design of a relational database is wrong, no amount of clever DML SQL will make it work well. Dr. Codd’s Information Principle is that you have, inside the entity tables, the columns that model the attributes of that entity. The columns contain scalar values. Tables that model relationships can have attributes, but they must have references to entities in the schema. You split those attributes at your peril. Joe Celko explains the basics.… Read more

Handling Graphs in SQL

Many practical database problems can be tackled more simply and intuitively by graphs or networks, which in this sense are graphs in which attributes can be associated with the nodes and edges. It is a natural way to study relationships within the data. SQL databases aren't the easiest way of doing it, but it makes sense where the scale permits it. Because of the range of graphs and techniques, some Graph theory is unavoidable before you get stuck into the code, and who better to introduce graph databases than Joe Celko? … Read more

Predicates With Subqueries

The ALL, SOME and ANY predicates aren't much used in SQL Server, but they are there. You can use the Exists() predicate instead but the logic is more contorted and difficult to read at a glance. Set-oriented predicates can greatly simplify the answering of many real-life business questions, so it is worth getting familiar with them. Joe Celko explains.… Read more

Formatting SQL Code – Part the Second

When you're formatting SQL Code, your objective is to make the code as easy to read with understanding as is possible, in a way that errors stand out. The extra time it takes to write code in an accessible way is far less than the time saved by the poor soul in the future, possibly yourself, when maintaining or enhancing the code. There isn't a single 'best practice, but the general principles, such as being consistent, are well-established. Joe Celko gives his take on a controversial topic.… Read more

The Practical Problems of Determining Equality and Equivalence in SQL

In theory, it is easy to determine whether data is equal to, greater than or lesser than a value. Real-world data types can be surprisingly tricky, even for gauging data equivalence. Because SQL cannot remain aloof from any application's data types, you need to be aware of the ways and pitfalls of individual datatypes and how you can test for equality, equivalence, similarity, identity and all that jazz… Read more

Declarative SQL: Using References

There are several ingenious ways of using SQL References to enforce integrity declaratively. Declarative Referential Integrity (DRI) is more effective than using procedural code in triggers, procedures or application layers because it uses the SQL paradigm, thereby making optimisation easier and providing clearer expression of the rules underlying the data.… Read more

Declarative SQL: Using UNIQUE Constraints

In SQL, you can express the logic of what you want to accomplish without spelling out the details of how the database should do it. Nowhere is this more powerful than in constraints. SQL is declarative, and Joe Celko demonstrates, in his introduction to Declarative SQL, how you can write portable code that performs well and executes some complex logic, merely by creating unique constraints.… Read more

SQL Style Habits: Attack of the Skeuomorphs

Although we like to think that our programming techniques are progressive and in tune with the bleeding edge of software development practices, too often they are directly influenced by restrictions faced in the post-war decades when computers first became mainstream. As these restrictions no longer apply, is it time to relinquish such things as cursors, 'tibbling', storing display formats, using short names for symbols and primary keys?… Read more

The DRI Subject of References

A database must be able to maintain and enforce the business rules and relationships in data in order to maintain the data model. It does this through referential constraints. They aren't complex, but are powerful, especially with the means to attach DRI actions to them. Joe Celko explains all, and pines for the ANSI CREATE ASSERTION statement… Read more

Data Is Crazier than You Think

As a society, we have an unrealistic respect for data, especially if it has a decimal point somewhere and uses metric units. We who are in the business of data need to cultivate a renewed interest in the sceptical and rigorous science of statistics: it is too important to leave to 'Data Scientists'. If the data is wrong, or the way we analyse or report it is misleading, much of what we do is pointless… Read more

How you log in to Simple Talk has changed

We now use Redgate ID (RGID). If you already have an RGID, we’ll try to match it to your account. If not, we’ll create one for you and connect it.

This won’t sign you up to anything or add you to any mailing lists. You can see our full privacy policy here.

Continue

Simple Talk now uses Redgate ID

If you already have a Redgate ID (RGID), sign in using your existing RGID credentials. If not, you can create one on the next screen.

This won’t sign you up to anything or add you to any mailing lists. You can see our full privacy policy here.

Continue