Querying PostgreSQL: Learning PostgreSQL with Grant

This is part of an ongoing series of post by Grant as he takes his vast knowledge of SQL Server and applies it to adding PostgreSQL and shares it with you so you can skip learn from his triumphs and mistakes. For more you can go to the Learning PostgreSQL with Grant series home page

Writing queries to retrieve the data from a database is probably the single most common task when it comes to working with data. Working with data in PostgreSQL is no exception. Further, PostgreSQL has an incredibly rich, wide, and varied set of mechanisms for retrieving data. From standard SELECT… FROM… WHERE to windowing functions and recursive queries, PostgreSQL has it all. I honestly can’t do it justice in a single article. Further, since so much of this functionality is effectively identical to where I’m more comfortable, SQL Server, I’m not turning this into a PostgreSQL 101 on the SELECT statement.

Instead, for this series, I’m just going to assume I may have more than one article on querying PostgreSQL. For this entry in the series, I’m going to focus on the core behaviors of SELECT, FROM and WHERE with an emphasis on what’s different from SQL Server. This won’t be a fundamental how-to on querying PostgreSQL, but instead an exploration of the gotchas you’re likely to experience coming in with existing knowledge of how you think these things should work. And hoo boy, there’s some fun stuff in there. Let’s get stuck in.

In the sample database I’ve created as a part of this ongoing series, I created a couple of schemas and organized my tables within them. If you wish to execute the code or look at the data structures, the code is in my ScaryDBA/LearningPostgreSQL repository here. The objects and database you will need can be created/reset using the CreateDatabase.sql script, then adding sample data using the SampleData.sql script. After executing that script, execute the Sample The rest of the code in this article is in the  10_Select folder.

FROM

I actually love how the PostgreSQL document defines what you’re doing in the FROM clause:

Trivial table expressions simply refer to a table on disk, a so-called base table, but more complex expressions can be used to modify or combine base tables in various ways.

While I wouldn’t myself define it this way, I find it to be interestingly more accurate than simply saying “table.” However, since you can make a query on a table expression, it is more than simply saying “tables.” Not all the definitions are useful though. Take this further explanation of what defines a table expression:

The result of the FROM list is an intermediate virtual table that can then be subject to transformations by the WHERE, GROUP BY, and HAVING clauses and is finally the result of the overall table expression.

It sounds like the FROM clause is a temporary table or something. Yet, I know (reading ahead in the docs) that execution plans in PostgreSQL are similar to SQL Server and this description leads us down a path: virtual table, that’s not accurate, depending on the whole query, structures involved and statistics. While it can help to visualize this way, it certainly is not implemented this way.

However, the rest is what I expect. List table names, and/or, define a table through a sub-SELECT. Aliasing, everything, pretty much the way it works in T-SQL because that’s the way it works in the ANSI SQL Standard, to which PostgreSQL complies very closely.

JOIN

So much of the FROM clause in PGSQL is the same as T-SQL. The first big difference is in the use of JOIN operations. The standard join operators are the same: INNER, LEFT/RIGHT OUTER, CROSS, FULL OUTER. These all perform the same logical functions. The fun stuff is in the internal syntax. Such as the USING clause:

The first time I did this, and it worked, it honestly felt like black magic. Of course, for true black magic, we’d have to use the NATURAL clause:

Basically, PostgreSQL figures out, based on naming and data type, which are the common columns between two tables, so you don’t have to define the JOIN criteria at all. Also note that I left off the syntactic sugar of INNER. PostgreSQL figured out what I wanted, just like SQL Server does. I like this standard which also allows me to do this:

Also worth noting, the AS word is optional. Personally, I prefer it, but I’m currently using DBeaver which supplies an alias (similar to SQL Prompt) but leaves off the AS key word.

Natural joins are a nice feature but beware that if your naming standard allows you to have the same name in multiple tables (for example a Name or Description column), you may not get the results you expect. Also, worth noting is if the tables do not contain a column with the same name, you will get the same results as if you use a CROSS JOIN. Hence this is something I would rarely use in production code but will be very nice doing ad hoc querying.

LATERAL

In addition to subqueries, there are also table valued functions. There are some differences in how they work, but the devil there is in the details, not the larger behaviors. When using subqueries and functions, you can get the T-SQL equivalent of a CROSS APPLY by using the PostgreSQL version, LATERAL.

Just like CROSS APPLY in T-SQL, you basically get a functional loop. By that I mean that for each row in the radio.bands table, you’ll get an execution of the LATERAL query. Any columns referenced from the outer tables (in this case radio.bands), will be available as parameters to the query.

The example above could be rewritten just using a traditional join to get the same results. However, when it comes time to start to use functions with parameters to return data, LATERAL becomes very useful. You can also use LEFT JOIN LATERAL to get the equivalent of a LEFT JOIN in the execution.

WHERE

Apart from unique PostgreSQL functions within the WHERE clause, it’s basically the same as T-SQL. No real weird surprises in functionality. For example, you can use the WHERE clause to define join criteria instead of using ON. However, just like in T-SQL, if you do this with an OUTER JOIN, you’re effectively turning it into an INNER JOIN (yes, I know that’s not the complete story, but it is, basically, what happens).

Mind you, the sheer number of value expressions as well as functions and operators is daunting. I’m not going to attempt to explain even a partial listing of what’s possible. Suffice to say, you can do a lot of things in the WHERE clause in PostgreSQL that you simply can’t in SQL Server. For example, in T-SQL, you have the LIKE operator to search a string for similar values. In PostgreSQL you also get SIMILAR TO <regular expression> and POSIX <regular expression>.

SELECT

I chose the order, FROM, WHERE, SELECT, because in PostgreSQL, as in SQL Server, this is the actual order of operations. In a simple query, you simply define where the data is originating: FROM. Then you apply filters to that data: WHERE. Finally, you decide what is being returned, SELECT.

By the nature of how a query works, all the examples I’ve listed so far show the basics of the SELECT operation. It’s very much the same as in T-SQL. I can specify columns from the defined tables and use the alias of those tables to make the code clearer (and shorter). In fact, most behaviors I’m used to, I can see: DISTINCT, WITH, UNION, ORDER BY and more.

However, there are some really interesting behaviors. For example, if you leave off a column alias like this:

You get a name automatically, “?column?“:

There are also some really interesting functions and clauses that change the way queries behave. For example, LIMIT, works very similarly to TOP:

Since there is no TOP operator, this is what you would use. However, there’s a wrinkle. You can actually add OFFSET to this and then it will return 3 rows, starting at the row you specify:

Just like TOP, if you don’t specify the order, you can get inconsistent results.

Another one is FETCH. It works a lot like LIMIT (SQL Server has OFFSET (part of the ORDER BY clause and FETCH clauses, but the syntax is quite different):

I can use the OFFSET in the same way. However, I get some additional behavior too. Instead of ONLY, I can tell it WITH TIES. In that case, if more than one value meets the top 3 criteria, we’ll get any ties, meaning, possibly, more than 3 rows, depending on how many rows match.

There are several additional differences and some other unique behaviors, but that’s most of the big stuff. I could spend a lot of time talking about the differences in window functions and locking, but there’s enough there to make independent articles just on those topics, so I’ll skip them here.

Conclusion

Mostly, when querying PostgreSQL, I just start typing and most of the time it works. There really are more similarities than differences. However, the differences are worth noting, especially when it comes to something like TOP vs. LIMIT/FETCH. Overall, these are small things, but they will act as a gotcha when you’re writing your code.