The SELECT
statement is the most used statement in the T-SQL language. It is executed to retrieve columns of data from one or more tables. The SELECT statement can constrain the data returned by using the WHERE
or HAVING
clause, and sort or group results using the ORDER
BY
and GROUP
BY
clauses, respectively. A SELECT
statement can also be made up of many different SELECT
statements, commonly called subqueries.
There are many different aspects of the SELECT
statement, making it a very complex T-SQL statement. This article is the first in a series of articles that will cover the various nuances of the SELECT
statement. In this article, I will discuss only the basics of the SELECT
statement.
The SELECT statement
The SELECT
statement has many different pieces, making it multifaceted. The main clauses supported by the SELECT
statement as found in the Microsoft Documentation are shown in Figure 1.
Figure 1: Main clauses of the SELECT statement
The SELECT
statement supports many different options. To discuss each of these various options would require a very long article. For the purpose of this article, I will only be focusing on a few clauses, which I am presenting as the basic SELECT
statement syntax shown in Figure 2.
Figure 2: Basic SELECT statement
The basic SELECT
statement has three parts: the select list, the FROM
clause, and the WHERE
clause.
The SELECT list
The select_list
component identifies the data that will be returned when the SELECT
statement is executed. This is the only part that is required in every SELECT
statement. All other attributes of a SELECT
statement are options.
The select_list
identifies one or more items of data that will be returned when the SELECT
statement is executed. When multiple items are identified in the list, each item is separated by a comma. The arguments of a select list can take on many different forms, like a column name, a calculation, a function call, or a literal constant, to name a few.
The format of the select_list
as found in the SELECT clause documentation is shown in Figure 3.
Figure 3: Select_list specifications
Listing 1 has an example of a SELECT
statement that contains only a selection list. There are two different columns of data in the list that will be returned.
Listing 1: Returning a calculated field and literal value
1 2 3 4 |
USE tempdb; GO SELECT 1024*1024, 'The Basic Select Statement'; |
Report 1 shows the two column values returned when Listing 1 is executed.
Report 1: Columns returned when Listing 1 is executed
The first column returned is the result of a calculation that multiplies 1024 by1024. The second column displayed is just the literal string The Basic Select Statement. The columns returned were not given column names, hence why the column names say (No column name).
Not having a column name for some columns in the result set is not really a problem, unless you need to reference the returned data by name. To assign names to these two columns, a column alias can be used.
Assigning column aliases
A column alias is useful when you want to assign a specific name to a column of data returned from a SELECT
statement. If you refer to the select_list
specifications found in Figure 3 there are two ways to define a column alias. One way is to use the AS
keyword, followed by the name of the column alias. The AS
keyword is optional. The other way is to use the equal operator (=
), where the alias name is defined on the left side of the =
sign. The SELECT
statement in Listing 2 returns the same data as Listing 1, but each item identified in the selection list has now been given a column alias name.
Listing 2: Defining column aliases
1 2 3 4 |
USE tempdb; GO SELECT 1024*1024 AS NumOfBytesInMB, 'The Basic Select Statement' BasicSelectStatement; |
When Listing 2 is executed each column returned will have a column alias name assigned, as shown in Report 2.
Report 2: Results from execution of Listing 2
In Listing 2, both column alias names do not contain any spaces in the name. If you want to include spaces in an alias name, then the name needs to be enclosed in quotes. Quotes can either be a set of brackets or single or double quotation marks. Lisitng 3 shows how to use the different quoting options.
Listing 3: Creating alias names with spaces
1 2 3 4 5 6 |
USE tempdb; GO SET QUOTED_IDENTIFIER ON SELECT 1024*1024 AS [Num of bytes in a MB], "Using Double Quotes" = 'The Basic Select Statement', 'ABC' AS 'String ABC'; |
The output in Report 3 is produced when Listing 3 is executed.
Report 3: Aliases with spaces in name
The FROM clause
The FROM
clause in SQL Server is used to identify a table or tables in which to retrieve data. In this basic SELECT
clause article, I will only discuss retrieving data from a single SQL Server table. Future articles will talk about retrieving data from multiple tables.
A table within SQL Server is stored in a database. A given SQL Server instance might have many databases, and there might be many tables within a database. Tables within a database are grouped and organized by a schema. The hierarchy of databases, schemas, and tables means there might be multiple tables within a SQL Server instance that have the same name. Because a given table name could reside in various schemas, databases, or even instances, the table name used must be uniquely identified in the FROM
clause. To uniquely identify a table in the FROM
clause, it can be named using a one, two, three, or four-part name, where each part is separated by a period (.
).
A one-part table name is a table name that doesn’t contain a period, like Orders or Customers. When a one-part name is used in the FROM
clause, the database engine needs to determine which schema owns the table. To identify the schema that owns a one-part table name SQL Server uses a two-step process. The first step is to see if the table resides in the default schema associated with the user submitting the SELECT
statement. If the table is in the default schema for the user, then that table is used, and the database engine doesn’t need to process the second step. If the table is not found in the user’s default schema, then the second step of identifying the table is performed. The second step looks into the dbo schema to try and find the table. When a database contains only a single schema named dbo, using one-part names makes sense. However, when there are multiple schemas in a database, it is best to use multiple part table names to clarify what table is used. It simplifies the amount of work SQL Server needs to perform to identify the table.
A two-part name is made up of both the table name and schema that contains the table with a period (.
) in between, like Sales.Orders, or Sales.Customer. When writing SELECT
statements that only query data from tables in a single database with multiple schemas, the best practice is to use two-part table names.
In most cases, when a SELECT
statement uses tables in a single database, one and two-part table names are used. Three-part table names are needed when code is run in the context of one database and needs to retrieve data from another database, or you are joining data from multiple tables that live in different databases. The third part of the name precedes the two-part table name and identifies the database in which the table lives, like AdventureWorks2019.Sales.Orders.
The last way to uniquely identify a table in the FROM
clause is to include the instance name as the fourth part. The instance name is placed in front of the fully qualified three-part table name. Four-part names are used to query across instances using a linked server. Discussion of Linked servers is an entirely separate topic, which I will leave for a future article.
The SELECT
statement example in Listing 4 runs in the context of the AdventureWorks2019 database using a two-part table name to return all the data in the Territory table, owned by the Sales schema.
Listing 4: Using two-part table name
1 2 3 |
USE AdventureWorks2019; GO SELECT * FROM Sales.SalesTerritory; |
The statement in Listing 5 uses a three-part table name to return all the data from the Territory table in the Sales schema within the AdventureWorks2019 database, but it runs in the context of the tempdb database.
Listing 5: Using three-part table name
1 2 3 |
USE tempdb GO SELECT * FROM AdventureWorks2019.Sales.SalesTerritory; |
The rows displayed in Report 4 are produced when either the code in Listing 4 or Listing 5 is executed.
Report 4: Output when Listing 4 or 5 is run
In Listing 4 and Listing 5, the *
wildcard character was used for the select_list option. The *
tells SQL Server that all columns from the AdventureWorks2019.Sales.SalesTerritory table are to be returned based on their ordinal positions within the table. Using the *
wildcard character in the selection list is a simple way to specify and select all the columns in a table. I wouldn’t recommend using wildcard characters for production code because if columns are added or dropped from the table being queried, the number of columns returned will be based on the table definition at the time of the query. Best practice is not to use wild card characters and instead specifically identify column names for the data that needs to be returned. Note that there are some situations where wildcards do make sense, and I’ll cover those in later articles. The SELECT
statement in Listing 6 returns the same results as Listings 4 and 5 but specifics the actual quoted column names instead of the *
wildcard character.
Listing 6: Specify column names
1 2 3 4 5 6 7 8 9 10 11 12 13 |
USE AdventureWorks2019; GO SELECT [TerritoryID], [Name], [CountryRegionCode], [Group], [SalesYTD], [SalesLastYear], [CostYTD], [CostLastYear], [rowguid], [ModifiedDate] FROM Sales.SalesTerritory; |
The WHERE clause
The SELECT
statement examples I have shown so far returned all the rows in the AdventurewWorks2019.Sales.SalesTerritory table. There might be times when you don’t want to return all the rows in a table but instead only want to return a subset of rows. When this is the case, the WHERE
clause can be used to restrict the rows returned.
When using the WHERE
clause, search criteria needs to be specified. The search criteria identifies one or more expressions, known as predicates, that each row must meet in order to be selected. The logical operators AND
, OR
, and NOT
can be used to string together multiple expressions for fine-tuning the exact rows that should be returned.
Listing 7 has a SELECT
statement that contains a single expression in the WHERE
clause. In this case, only the Sales.SalesTerritory records that have their SalesLastYear value greater than 3,000,000 will be returned.
Listing 7: Simple search condition in WHERE statement
1 2 3 4 |
USE AdventureWorks2019; GO SELECT * FROM Sales.SalesTerritory WHERE SalesLastYear > 3000000; |
When the code in Listing 6 is executed the rows in Report 5 are returned.
Report 5: Rows returned when Listing 7 is executed
Sometimes more complex search conditions are needed to produce a specific subset of rows. The SELECT
statement in Listing 8 uses two different conditions in the WHERE
clause to narrow down the rows returned. The two different conditions use the logical AND
operator to create a complex compound search condition.
Listing 8: Using two expressions with the AND operator
1 2 3 4 5 |
USE AdventureWorks2019; GO SELECT * FROM Sales.SalesTerritory WHERE SalesLastYear > 3000000 AND CountryRegionCode = 'CA'; |
Report 6 shows the output created when Listing 8 is run.
Report 6: Results when Listing 8 is executed
The basic T-SQL SELECT statement
The SELECT
statement is the most used statement in the T-SQL language. Understanding the basics of the SELECT
clause in the foundation of all other formats of the SELECT
statement. Identifying columns in the selection list, tables in the FROM
, and constraints in the WHERE
clause will allow you to return any subset of rows from a database table. I will discuss other more complex aspects of querying SQL Server data using the SELECT
statement in future articles.
Load comments