The basic T-SQL SELECT statement

The SQL language is used across many relational database platforms. Greg Larsen explains the basics of the SELECT statement for SQL Server.

The SELECT statement is the most used statement in the T-SQL language. It is executed to retrieve columns of data from one or more tables. The SELECT statement can constrain the data returned by using the WHERE or HAVING clause, and sort or group results using the ORDER BY and GROUP BY clauses, respectively. A SELECT statement can also be made up of many different SELECT statements, commonly called subqueries.

There are many different aspects of the SELECT statement, making it a very complex T-SQL statement. This article is the first in a series of articles that will cover the various nuances of the SELECT statement. In this article, I will discuss only the basics of the SELECT statement.

The SELECT statement

The SELECT statement has many different pieces, making it multifaceted. The main clauses supported by the SELECT statement as found in the Microsoft Documentation are shown in Figure 1.

An image showing the many options of the SELECT statement

Figure 1: Main clauses of the SELECT statement

The SELECT statement supports many different options. To discuss each of these various options would require a very long article. For the purpose of this article, I will only be focusing on a few clauses, which I am presenting as the basic SELECT statement syntax shown in Figure 2.

An image showing the three parts: SELECT list, FROM table_source, WHERE search_condition

Figure 2: Basic SELECT statement

The basic SELECT statement has three parts: the select list, the FROM clause, and the WHERE clause.

The SELECT list

The select_list component identifies the data that will be returned when the SELECT statement is executed. This is the only part that is required in every SELECT statement. All other attributes of a SELECT statement are options.

The select_list identifies one or more items of data that will be returned when the SELECT statement is executed. When multiple items are identified in the list, each item is separated by a comma. The arguments of a select list can take on many different forms, like a column name, a calculation, a function call, or a literal constant, to name a few.

The format of the select_list as found in the SELECT clause documentation is shown in Figure 3.

An image showing many options for the SELECT list

Figure 3: Select_list specifications

Listing 1 has an example of a SELECT statement that contains only a selection list. There are two different columns of data in the list that will be returned.

Listing 1: Returning a calculated field and literal value

Report 1 shows the two column values returned when Listing 1 is executed.

Report 1: Columns returned when Listing 1 is executed

The first column returned is the result of a calculation that multiplies 1024 by1024. The second column displayed is just the literal string The Basic Select Statement. The columns returned were not given column names, hence why the column names say (No column name).

Not having a column name for some columns in the result set is not really a problem, unless you need to reference the returned data by name. To assign names to these two columns, a column alias can be used.

Assigning column aliases

A column alias is useful when you want to assign a specific name to a column of data returned from a SELECT statement. If you refer to the select_list specifications found in Figure 3 there are two ways to define a column alias. One way is to use the AS keyword, followed by the name of the column alias. The AS keyword is optional. The other way is to use the equal operator (=), where the alias name is defined on the left side of the = sign. The SELECT statement in Listing 2 returns the same data as Listing 1, but each item identified in the selection list has now been given a column alias name.

Listing 2: Defining column aliases

When Listing 2 is executed each column returned will have a column alias name assigned, as shown in Report 2.

Report 2: Results from execution of Listing 2

An image showing table results: NumOfBytesInMB, BasicSelectStatement, 1048576, The Basic Select Statement

In Listing 2, both column alias names do not contain any spaces in the name. If you want to include spaces in an alias name, then the name needs to be enclosed in quotes. Quotes can either be a set of brackets or single or double quotation marks. Lisitng 3 shows how to use the different quoting options.

Listing 3: Creating alias names with spaces

The output in Report 3 is produced when Listing 3 is executed.

Report 3: Aliases with spaces in name

The FROM clause

The FROM clause in SQL Server is used to identify a table or tables in which to retrieve data. In this basic SELECT clause article, I will only discuss retrieving data from a single SQL Server table. Future articles will talk about retrieving data from multiple tables.

A table within SQL Server is stored in a database. A given SQL Server instance might have many databases, and there might be many tables within a database. Tables within a database are grouped and organized by a schema. The hierarchy of databases, schemas, and tables means there might be multiple tables within a SQL Server instance that have the same name. Because a given table name could reside in various schemas, databases, or even instances, the table name used must be uniquely identified in the FROM clause. To uniquely identify a table in the FROM clause, it can be named using a one, two, three, or four-part name, where each part is separated by a period (.).

A one-part table name is a table name that doesn’t contain a period, like Orders or Customers. When a one-part name is used in the FROM clause, the database engine needs to determine which schema owns the table. To identify the schema that owns a one-part table name SQL Server uses a two-step process. The first step is to see if the table resides in the default schema associated with the user submitting the SELECT statement. If the table is in the default schema for the user, then that table is used, and the database engine doesn’t need to process the second step. If the table is not found in the user’s default schema, then the second step of identifying the table is performed. The second step looks into the dbo schema to try and find the table. When a database contains only a single schema named dbo, using one-part names makes sense. However, when there are multiple schemas in a database, it is best to use multiple part table names to clarify what table is used. It simplifies the amount of work SQL Server needs to perform to identify the table.

A two-part name is made up of both the table name and schema that contains the table with a period (.) in between, like Sales.Orders, or Sales.Customer. When writing SELECT statements that only query data from tables in a single database with multiple schemas, the best practice is to use two-part table names.

In most cases, when a SELECT statement uses tables in a single database, one and two-part table names are used. Three-part table names are needed when code is run in the context of one database and needs to retrieve data from another database, or you are joining data from multiple tables that live in different databases. The third part of the name precedes the two-part table name and identifies the database in which the table lives, like AdventureWorks2019.Sales.Orders.

The last way to uniquely identify a table in the FROM clause is to include the instance name as the fourth part. The instance name is placed in front of the fully qualified three-part table name. Four-part names are used to query across instances using a linked server. Discussion of Linked servers is an entirely separate topic, which I will leave for a future article.

The SELECT statement example in Listing 4 runs in the context of the AdventureWorks2019 database using a two-part table name to return all the data in the Territory table, owned by the Sales schema.

Listing 4: Using two-part table name

The statement in Listing 5 uses a three-part table name to return all the data from the Territory table in the Sales schema within the AdventureWorks2019 database, but it runs in the context of the tempdb database.

Listing 5: Using three-part table name

The rows displayed in Report 4 are produced when either the code in Listing 4 or Listing 5 is executed.

Report 4: Output when Listing 4 or 5 is run

An image showing tabular results of querying all the columns of the Sales.SalesTerritory table

In Listing 4 and Listing 5, the * wildcard character was used for the select_list option. The * tells SQL Server that all columns from the AdventureWorks2019.Sales.SalesTerritory table are to be returned based on their ordinal positions within the table. Using the * wildcard character in the selection list is a simple way to specify and select all the columns in a table. I wouldn’t recommend using wildcard characters for production code because if columns are added or dropped from the table being queried, the number of columns returned will be based on the table definition at the time of the query. Best practice is not to use wild card characters and instead specifically identify column names for the data that needs to be returned. Note that there are some situations where wildcards do make sense, and I’ll cover those in later articles. The SELECT statement in Listing 6 returns the same results as Listings 4 and 5 but specifics the actual quoted column names instead of the * wildcard character.

Listing 6: Specify column names

The WHERE clause

The SELECT statement examples I have shown so far returned all the rows in the AdventurewWorks2019.Sales.SalesTerritory table. There might be times when you don’t want to return all the rows in a table but instead only want to return a subset of rows. When this is the case, the WHERE clause can be used to restrict the rows returned.

When using the WHERE clause, search criteria needs to be specified. The search criteria identifies one or more expressions, known as predicates, that each row must meet in order to be selected. The logical operators AND, OR, and NOT can be used to string together multiple expressions for fine-tuning the exact rows that should be returned.

Listing 7 has a SELECT statement that contains a single expression in the WHERE clause. In this case, only the Sales.SalesTerritory records that have their SalesLastYear value greater than 3,000,000 will be returned.

Listing 7: Simple search condition in WHERE statement

When the code in Listing 6 is executed the rows in Report 5 are returned.

Report 5: Rows returned when Listing 7 is executed

An image showing tabular results. Only 6 rows returned because of the WHERE clause

Sometimes more complex search conditions are needed to produce a specific subset of rows. The SELECT statement in Listing 8 uses two different conditions in the WHERE clause to narrow down the rows returned. The two different conditions use the logical AND operator to create a complex compound search condition.

Listing 8: Using two expressions with the AND operator

Report 6 shows the output created when Listing 8 is run.

Report 6: Results when Listing 8 is executed

An image showing tabular results. Only 1 row returned because of multiple predicate where clause

The basic T-SQL SELECT statement

The SELECT statement is the most used statement in the T-SQL language. Understanding the basics of the SELECT clause in the foundation of all other formats of the SELECT statement. Identifying columns in the selection list, tables in the FROM, and constraints in the WHERE clause will allow you to return any subset of rows from a database table. I will discuss other more complex aspects of querying SQL Server data using the SELECT statement in future articles.