{"id":95263,"date":"2023-01-05T20:40:03","date_gmt":"2023-01-05T20:40:03","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=95263"},"modified":"2026-04-15T18:57:18","modified_gmt":"2026-04-15T18:57:18","slug":"introducing-the-mysql-common-table-expression","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/mysql\/introducing-the-mysql-common-table-expression\/","title":{"rendered":"MySQL Common Table Expressions (CTEs): WITH Clause, Recursive Queries"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\" id=\"h-executive-summary\">Executive Summary<\/h2>\n\n\n\n<p><strong>A MySQL Common Table Expression (CTE) is a named temporary result set defined with a WITH clause that can be referenced within the same SELECT, INSERT, UPDATE, or DELETE statement. CTEs make complex queries more readable by giving meaningful names to intermediate result sets &#8211; replacing nested subqueries with named, sequential query steps. MySQL 8.0 introduced CTEs; they are not available in MySQL 5.x. This article covers basic CTEs, defining multiple CTEs in one WITH clause, recursive CTEs (for traversing hierarchical data like category trees or organisational charts), and using CTEs with DML statements.<\/strong><\/p>\n\n\n\n<p><strong>This article is part of Robert Sheldon's continuing series on Learning MySQL. To see all of the items in the series, <a href=\"https:\/\/www.red-gate.com\/simple-talk\/mysql-coding-basics\/\">click here<\/a>.<\/strong><\/p>\n\n\n\n\n<p>As with many relational database management systems, MySQL provides a variety of methods for combining data in a data manipulation language (DML) statement. You can join multiple tables in a single query or add subqueries that pull data in from other tables. You can also access views and temporary tables from within a statement, often along with permanent tables.<\/p>\n\n\n\n<p>MySQL also offers another valuable tool for working with data\u2014the common table expression (CTE). A CTE is a named result set that you define in a <code>WITH<\/code> clause. The <code>WITH<\/code> clause is associated with a single DML statement but is created outside the statement. However, only that statement can access the result set.<\/p>\n\n\n\n<p>In some cases, you can include a CTE with a <code>SELECT<\/code> statement that is embedded in another statement, as in the case of a subquery or a <code>DELETE\u2026SELECT<\/code> statement. But even then, the <code>WITH<\/code> clause is defined outside of that <code>SELECT<\/code> statement, and only that <code>SELECT<\/code> statement can access the result set.<\/p>\n\n\n\n<p>One way to think of a CTE is as a type of view with a very limited scope (one statement). Another way to think of a CTE is as a type of named subquery that is defined in a clause separate from the main query. However, a CTE is neither of these, and in this article, I explain how the CTE works and walk you through a number of examples that demonstrate the different ways you can use them to retrieve data.<\/p>\n\n\n\n<p>Note: the examples use the tables and data that is created in the last section of the article titled: \u201cAppendix: Preparing the demo objects and data\u201d.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-getting-started-with-common-table-expressions\">Getting started with common table expressions<\/h2>\n\n\n\n<p>A common table expression is defined inside a <code>WITH<\/code> clause. The clause precedes the main DML statement, which is sometimes referred to as the top-level statement. In addition, the clause can contain one or more CTE definitions, as shown in the following syntax:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">WITH [RECURSIVE]\n  cte_name [(col_name [, col_name] ...)] AS (select_query)\n  [, cte_name [(col_name [, col_name] ...)] AS (select_query)] ...\ntop_level_statement;<\/pre>\n\n\n\n<p>If the <code>WITH<\/code> clause contains more than one CTE, you must separate them with commas, and you must assign a unique name to each CTE, although this applies only within the context of the <code>WITH<\/code> clause. For example, two <code>SELECT<\/code> statements can include CTEs with the same name because a CTE is limited to the scope of its associated top-level statement.<\/p>\n\n\n\n<p>The CTE name is followed by one or more optional column names, then the <code>AS<\/code> keyword, and finally a <code>SELECT<\/code> query enclosed in parentheses. If you specify column names, their number must match the number of columns returned by the <code>SELECT<\/code> query. If you don\u2019t specify column names, the column names returned by the <code>SELECT<\/code> query are used.<\/p>\n\n\n\n<p>Common table expressions are typically used with <code>SELECT<\/code> statements. However, you can also use them with <code>UPDATE<\/code> and <code>DELETE<\/code> statements, following the same syntax as shown above. In addition, you can include CTEs with your subqueries when passing them into your outer statements. You can also use CTEs in statements that support the use of <code>SELECT<\/code> as part of the statement definition. For example, you can add a <code>WITH<\/code> clause to the <code>SELECT<\/code> query in an <code>INSERT\u2026SELECT<\/code> statement or in a <code>CREATE<\/code> <code>TABLE\u2026SELECT<\/code> statement.<\/p>\n\n\n\n<p>For this article, I focus primarily on creating CTEs that use <code>SELECT<\/code> statements as their top-level statements because this is the most common way to use a CTE. This approach is also a good way to start learning about CTEs without putting any data at risk. You can then apply the fundamental principles you learn here to other types of statements as you become more comfortable with how CTEs work.<\/p>\n\n\n\n<p>With that in mind, let\u2019s start with a simple example. The following <code>SELECT<\/code> statement includes a <code>WITH<\/code> clause that defines a CTE named <code>planes<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">WITH planes AS\n  (SELECT plane, engine_count, max_weight\n  FROM airplanes WHERE engine_type = 'jet');\n\nSELECT plane, max_weight FROM planes\nORDER BY max_weight DESC;<\/pre>\n\n\n\n<p>The <code>SELECT<\/code> query in the CTE retrieves all the planes in the <code>airplanes<\/code> table that have <code>jet<\/code> as the <code>engine_type<\/code>. The CTE\u2019s result set is made up of the data returned by the <code>SELECT<\/code> query and can be accessed by the top-level <code>SELECT<\/code> statement.<\/p>\n\n\n\n<p>The top-level statement retrieves the data directly from the CTE, similar to how a statement might retrieve data from a view. The main difference is that the view definition is persisted to the database and can be used by anyone with adequate privileges. The CTE, on the other hand, has a very limited scope and can be accessed only within the context of the top-level statement.<\/p>\n\n\n\n<p>In this case, the top-level <code>SELECT<\/code> statement retrieves only the <code>plane<\/code> and <code>max_weight<\/code> columns from the CTE and orders the results by the <code>max_weight<\/code> column, in descending order. The following figure shows the results returned by the statement.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"850\" height=\"806\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/01\/word-image-30.png\" alt=\"\" class=\"wp-image-95273\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Of course, you can easily achieve the same results without the use of a CTE by querying the <code>airplanes<\/code> table directly:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT plane, max_weight \nFROM airplanes\nWHERE engine_type = 'jet'\nORDER BY max_weight DESC;<\/pre>\n\n\n\n<p>However, I wanted to demonstrate the basic components that go into a CTE and how you can access that CTE from within the top-level <code>SELECT<\/code> statement. Both the CTE and top-level statement can certainly be much more complex\u2014and usually are\u2014but the principles remain the same.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-working-with-ctes-in-the-top-level-select-statement\">Working with CTEs in the top-level <code><strong>SELECT<\/strong><\/code> statement<\/h2>\n\n\n\n<p>As mentioned earlier, a CTE is basically a named result set. When you query the CTE from within the top-level statement, the data is returned in a tabular format, similar to what you get when you query a view, permanent table, temporary table, or derived table (such as that produced by a subquery in a <code>SELECT<\/code> statement\u2019s <code>FROM<\/code> clause). This means that you can work with the CTE in much the same way as you can those other object types. For example, one common approach to referencing a CTE within the top-level query is to join it with another table, as in the following example:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">WITH mfcs AS\n  (SELECT manufacturer_id, manufacturer FROM manufacturers)\nSELECT a.plane, m.manufacturer, a.max_weight\nFROM airplanes a INNER JOIN mfcs m\n  ON a.manufacturer_id = m.manufacturer_id\nWHERE a.engine_type = 'jet'\nORDER BY a.max_weight DESC;<\/pre>\n\n\n\n<p>The <code>WITH<\/code> clause defines a single CTE named <code>mfcs<\/code>. The CTE\u2019s <code>SELECT<\/code> query returns the <code>manufacturer_id<\/code> and <code>manufacturer<\/code> values from the <code>manufacturers<\/code> table. The top-level <code>SELECT<\/code> statement then joins the <code>airplanes<\/code> table to the <code>mfcs<\/code> CTE, based on the <code>manufacturer_id<\/code> column in each one. The following figure shows the statement\u2019s results.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1098\" height=\"812\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/01\/word-image-31.png\" alt=\"\" class=\"wp-image-95274\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>As this example demonstrates, you can treat the CTE much like any other table structure in your top-level query. However, just as you saw earlier, you can also recast this statement without a CTE by joining the <code>airplanes<\/code> table directly to the <code>manufacturers<\/code> table:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT a.plane, m.manufacturer, a.max_weight\nFROM airplanes a INNER JOIN manufacturers m\n  ON a.manufacturer_id = m.manufacturer_id\nWHERE a.engine_type = 'jet'\nORDER BY a.max_weight DESC;<\/pre>\n\n\n\n<p>Because there is so little data, the difference in performance between the two statements is negligible, but that might not be the case with a much larger data set. However, as is often the case with MySQL, it can be difficult to know which approach is best without running both statements against a realistic data set. Even then there might not be a significant difference in performance, in which case, it might come down to developer preference.<\/p>\n\n\n\n<p>As noted earlier, MySQL often supports multiple ways to achieve the same results, as the preceding examples demonstrate. Common table expressions can sometimes help simplify code and make it more readable, which is a big point in its favor, but performance should generally be the overriding consideration.<\/p>\n\n\n\n<p>Comparing different approaches usually requires that you test them on a valid data set, in part, because it can be difficult to find specific recommendations when comparing approaches. Not many database developers, for example, would be willing to say that you should <em>always<\/em> use CTEs rather than inner joins in <em>all<\/em> circumstances, or vice versa.<\/p>\n\n\n\n<p>That said, you might come across recommendations that are less rigid and are perhaps worth considering, such as when comparing CTEs with subqueries. For instance, a CTE is often considered to be a better option if your <code>SELECT<\/code> statement includes multiple subqueries retrieving the same data, as in the following example:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">SELECT manufacturer_id, plane_id, plane, max_weight,\n  (SELECT ROUND(AVG(max_weight)) FROM airplanes a2\n    WHERE a.manufacturer_id = a2.manufacturer_id) AS avg_weight,\n  (max_weight - (SELECT ROUND(AVG(max_weight)) FROM airplanes a2\n    WHERE a.manufacturer_id = a2.manufacturer_id)) AS amt_over\nFROM airplanes a\nWHERE max_weight &gt; \n  (SELECT ROUND(AVG(max_weight)) FROM airplanes a2\n    WHERE a.manufacturer_id = a2.manufacturer_id);<\/pre>\n\n\n\n<p>If this statement looks familiar to you, that\u2019s because I pulled it from <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/mysql\/subqueries-in-mysql\/\">my previous article in this series, which covers subqueries.<\/a> As you can see, the statement includes three subqueries, all of them the same, which can result in a fair amount of redundant processing effort, depending on how the database engine chooses to handle the query. The following figure shows the results returned by this statement.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1372\" height=\"564\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/01\/word-image-32.png\" alt=\"\" class=\"wp-image-95275\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Instead of using subqueries, you can achieve the same results by defining a CTE that retrieves the average <code>max_weight<\/code> value for each manufacturer. Then, in your top-level query, you can join the <code>airplanes<\/code> table to the CTE, basing the join on the manufacturer IDs, as shown in the following example:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">WITH mfc_weights (id, avg_weight) AS\n  (SELECT manufacturer_id, ROUND(AVG(max_weight))\n  FROM airplanes GROUP BY manufacturer_id)\nSELECT a.manufacturer_id, a.plane_id, a.plane, \n  a.max_weight, m.avg_weight,\n  (a.max_weight - m.avg_weight) AS amt_over\nFROM airplanes a INNER JOIN mfc_weights m \n  ON a.manufacturer_id = m.id\nWHERE max_weight &gt; m.avg_weight;<\/pre>\n\n\n\n<p>In this case, the CTE specifies the column names to use for the result set, so the <code>manufacturer_id<\/code> values are returned as the <code>id<\/code> column. Additionally, the CTE groups the data in the <code>manufacturers<\/code> table by the <code>manufacturer_id<\/code> values and provides the average <code>avg_weight<\/code> value for each one.<\/p>\n\n\n\n<p>The top-level query then joins the <code>airplanes<\/code> table to the CTE but limits the results to those airplanes with a <code>max_weight<\/code> value greater than the average weight returned by the CTE. Notice that, in place of the subqueries, the statement now uses the <code>avg_weight<\/code> column from the CTE.<\/p>\n\n\n\n<p>Once again, the performance difference between these two approaches is negligible because we\u2019re working with such a small data set. Only by running the statements against a more realist data set can we get a true picture of their performance differences. In my opinion, however, the CTE makes the code more readable, that is, it\u2019s easier to follow the statement\u2019s logic.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-defining-multiple-ctes-in-one-with-clause\">Defining multiple CTEs in one <code><strong>WITH<\/strong><\/code> clause<\/h2>\n\n\n\n<p>Up to this point, the examples in this article included only one CTE per <code>WITH<\/code> clause, but you can define multiple CTEs and reference any of them in your top-level statement. Just make sure that you assign different names to the CTEs and separate them with commas. For example, the following <code>WITH<\/code> clause defines three CTEs, which are all referenced in the top-level <code>SELECT<\/code> statement:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">WITH \n  jets AS\n    (SELECT plane, engine_type, engine_count, max_weight\n    FROM airplanes WHERE engine_type = 'jet'),  \n  turbos AS\n    (SELECT plane, engine_type, engine_count, max_weight\n    FROM airplanes WHERE engine_type = 'turboprop'),\n  pistons AS\n    (SELECT plane, engine_type, engine_count, max_weight\n    FROM airplanes WHERE engine_type = 'piston')\nSELECT * FROM jets\nUNION ALL\nSELECT * FROM turbos\nUNION ALL\nSELECT * FROM pistons;<\/pre>\n\n\n\n<p>The three CTEs are similar except that they each pull data based on a different <code>engine_type<\/code> value. The top-level <code>SELECT<\/code> query then uses the <code>UNION<\/code> operator to join them altogether. (The <code>UNION ALL<\/code> operator combines the results from multiple <code>SELECT<\/code> statements into a single result set.) The following figures shows part of the results returned by this statement.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1056\" height=\"760\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/01\/word-image-33.png\" alt=\"\" class=\"wp-image-95276\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>This is a fairly basic example, but it demonstrates the concept of defining multiple CTEs and referencing them in the top-level query. In this case, the three CTEs operate independently of each other, but you don\u2019t always have to take this approach. For instance, the <code>WITH<\/code> clause in the following example also includes three CTEs, but in this case, the second CTE (<code>mfc_avg<\/code>) references the first CTE (<code>mfcs<\/code>), while the third CTE (<code>pl_avg<\/code>) stands alone:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">WITH\n  mfcs (id, mfc) AS\n    (SELECT manufacturer_id, manufacturer FROM manufacturers),\n  mfc_avg (id, mfc, avg_parking) AS\n    (SELECT m.id, m.mfc, ROUND(AVG(a.parking_area))\n    FROM mfcs m INNER JOIN airplanes a \n      ON m.id = a.manufacturer_id\n    GROUP BY manufacturer_id),\n  pl_avg (avg_all) AS\n    (SELECT ROUND(AVG(parking_area)) FROM airplanes)\nSELECT id, mfc, avg_parking\nFROM mfc_avg m\nWHERE avg_parking &gt; (SELECT avg_all FROM pl_avg);<\/pre>\n\n\n\n<p>As this example demonstrates, a CTE can reference a CTE that comes before it. However, this works in one direction only; a CTE cannot reference one that comes after it. In this case, the <code>mfc_avg<\/code> CTE joins the <code>airplanes<\/code> table to the <code>mfcs<\/code> CTE and groups the data based on the <code>manufacturer_id<\/code> value. The top-level query then retrieves data from this CTE, but returns only those rows with an <code>avg_parking<\/code> value greater than the average returned by the <code>pl_avg<\/code> CTE. The following figure shows the results returned by this statement.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"636\" height=\"262\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/01\/word-image-34.png\" alt=\"\" class=\"wp-image-95277\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Something worth emphasizing is that the <code>WHERE<\/code> clause in the top-level query includes a subquery that retrieves data from the <code>pl_avg<\/code> CTE. Not only does this point to the inherent flexibility of CTEs, but also to the fact that CTEs and subqueries are not mutually exclusive.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-working-with-recursive-ctes\">Working with recursive CTEs<\/h2>\n\n\n\n<p>One of the most useful aspects of the CTE is its ability to perform recursive queries. This type of CTE\u2014known as the recursive CTE\u2014is one that references itself within the CTE\u2019s query. The <code>WITH<\/code> clause in a recursive CTE must include the <code>RECURSIVE<\/code> keyword, and the CTE\u2019s query must include two parts that are separated by the <code>UNION<\/code> operator. The first (nonrecursive) part populates an initial row of data, and the second (recursive) part carries out the actual recursion based on that first row. Only the recursive part can refer to the CTE itself.<\/p>\n\n\n\n<p>To help understand how this work, consider the following example, which generates a list of even numbers up to and including 20:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">WITH RECURSIVE counter (val) AS\n  (SELECT 2\n  UNION ALL\n  SELECT val + 2 FROM counter WHERE val &lt; 20)\nSELECT * FROM counter;<\/pre>\n\n\n\n<p>The name of the CTE is <code>counter<\/code>, and it returns only the <code>val<\/code> column. The nonrecursive part of the CTE\u2019s query sets the value of the first row as <code>2<\/code>, which is assigned to the <code>val<\/code> column. The recursive part of the query retrieves the data from the CTE but increments the <code>val<\/code> column by <code>2<\/code> with each iteration. The query continues to increment the <code>val<\/code> column by <code>2<\/code> as long as <code>val<\/code> is less than <code>20<\/code>. The top-level <code>SELECT<\/code> statement then retrieves the data from the CTE, returning the results shown in the following figure.<\/p>\n\n\n\n<p>Note: Since the recursive query says &lt; 20, you might be inclined to think it would not return 20 in the output. But the iteration that returns 20 does occur, but it stops iterating because the value is not less than 20 anymore.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"280\" height=\"460\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/01\/word-image-35.png\" alt=\"\" class=\"wp-image-95278\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>When building a recursive CTE, be aware that MySQL places several limitations on the recursive part. The second <code>SELECT<\/code> statement cannot contain aggregate functions, window functions, the <code>DISTINCT<\/code> keyword, or the <code>GROUP<\/code> <code>BY<\/code> or <code>ORDER<\/code> <code>BY<\/code> clause.<\/p>\n\n\n\n<p>Performing a recursive query can be useful when working with hierarchical data. To demonstrate how this works, I used the following code to create and populate a table named <code>airline_emps<\/code>, which stores the IDs and job titles for a group of fictitious employees, along with the ID of the person each one reports to:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">CREATE TABLE airline_emps (\n  emp_id INT UNSIGNED NOT NULL,\n  job_title varchar(50) NOT NULL,\n  reports_to INT UNSIGNED NULL,\n  create_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,\n  last_update TIMESTAMP NOT NULL \n    DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,\n  PRIMARY KEY (emp_id),\n  CONSTRAINT fk_emp_id FOREIGN KEY (reports_to) \n    REFERENCES airline_emps (emp_id) );\n\nINSERT INTO airline_emps\n  (emp_id, job_title, reports_to)\nVALUES\n  (1, 'big boss', NULL),\n  (2, 'divsional boss', 1),\n  (3, 'supervisor', 2),  \n  (4, 'chief', 2),\n  (5, 'designer', 3),\n  (6, 'top exec', 1),\n  (7, 'overseer', 6),\n  (8, 'team leader', 6),\n  (9, 'organizer', 8),\n  (10, 'controller', 8),\n  (11, 'specialist', 10),\n  (12, 'analyst', 9);\n\nSELECT * FROM airline_emps;<\/pre>\n\n\n\n<p>Everyone except for <code>emp_id<\/code> <code>1<\/code> (big boss) reports to another individual. For example, the overseer reports to the top exec, who in turn reports to the big boss. I included a <code>SELECT<\/code> statement along with the <code>INSERT<\/code> statement so you can verify the data after it\u2019s been added to the table.<\/p>\n\n\n\n<p>With this data in place, you can now create the following recursive CTE, which finds each person\u2019s position level in the company and how that position fits into the reporting hierarchy:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">WITH RECURSIVE emps AS\n  (SELECT emp_id, job_title, reports_to, \n    1 AS emp_tier, CAST(emp_id AS CHAR(50)) AS emp_path\n  FROM airline_emps\n  WHERE reports_to IS NULL\n  UNION ALL\n  SELECT a.emp_id, a.job_title, a.reports_to, \n    e.emp_tier + 1, \n    CONCAT(a.emp_id, ' \/ ', e.emp_path)\n  FROM airline_emps a INNER JOIN emps e\n    ON a.reports_to = e.emp_id)\nSELECT * FROM emps\nORDER BY emp_tier, emp_id;<\/pre>\n\n\n\n<p>Because this is a recursive CTE, it is separated in two parts that are connected with the <code>UNION ALL<\/code> operator. The nonrecursive part populates the first row, based on the <code>NULL<\/code> value in the <code>reports_to<\/code> column. This row is for the big boss, who is at the top of the hierarchy. The nonrecursive part also assigns the value <code>1<\/code> to the <code>emp_tier<\/code> column and assigns the <code>emp_id<\/code> value to the <code>emp_path<\/code> column, converting the value to the <code>CHAR<\/code> data type. The first row returned by the CTE looks similar to that shown in the following figure.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"902\" height=\"200\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/01\/word-image-36.png\" alt=\"\" class=\"wp-image-95279\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>The recursive part of the CTE uses an inner join to match the <code>airline_emps<\/code> table to the <code>emps<\/code> CTE. The join is based on the <code>reports_to<\/code> column in the <code>airline_emps<\/code> table and the <code>emp_id<\/code> column in the <code>emps<\/code> CTE. The join condition makes it possible to recurse through each level of the reporting hierarchy, based on the <code>reports_to<\/code> value. The recursive part then increments the <code>emp_tier<\/code> column by <code>1<\/code> with each new level in the hierarchy.<\/p>\n\n\n\n<p>For example, <code>emp_id<\/code> <code>2<\/code> (divisional boss) and <code>emp_id<\/code> <code>6<\/code> (top exec) both report directly to <code>emp_id<\/code> <code>1<\/code> (big boss), so the <code>emp_tier<\/code> column for these two rows is incremented by <code>1<\/code>, resulting in a value of <code>2<\/code> for each row. This means that they\u2019re both in the second tier of the employee hierarchy. The next layer in the hierarchy are those individuals who report to the divisional boss or top exec, so the <code>emp_tier<\/code> column for these rows is set to <code>3<\/code>. This process continues until there are no tiers left.<\/p>\n\n\n\n<p>During this process, the <code>emp_path<\/code> column is also updated in each row by concatenating the <code>emp_id<\/code> vales to provide a representation of the reporting hierarchy. For instance, the <code>reports_to<\/code> column for <code>emp_id<\/code> <code>9<\/code> will show that the organizer reports to <code>emp_id<\/code> <code>8<\/code> (team leader), who reports to <code>emp_id<\/code> <code>6<\/code> (top exec), who reports to <code>emp_id<\/code> <code>1<\/code> (big boss), with each layer separated by a forward slash. The following figure shows the data returned by the query.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"890\" height=\"534\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/01\/word-image-37.png\" alt=\"\" class=\"wp-image-95280\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>The top-level <code>SELECT<\/code> statement retrieves data only from the CTE, without joining to any other tables. The statement also includes an <code>ORDER<\/code> <code>BY<\/code> clause that sorts the results first by the <code>emp_tier<\/code> column and then by the <code>emp_id<\/code> column.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-using-ctes-with-dml-statements\">Using CTEs with DML statements<\/h2>\n\n\n\n<p>Earlier in the article, I mentioned that you can use CTEs with statements other than <code>SELECT<\/code>. I also stated that my focus in this article was primarily on how the CTE is implemented with the <code>SELECT<\/code> statement. However, I want to show you at least one of the alternative forms so you get a sense of what that might look like (and to whet your appetite a bit).<\/p>\n\n\n\n<p>The following example shows a CTE used with an <code>UPDATE<\/code> statement to modify the data in the <code>airline_emps<\/code> table created in the previous section:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">WITH rpts AS\n  (SELECT emp_id FROM airline_emps\n  WHERE reports_to = 8)\nUPDATE airline_emps\nSET reports_to = 7\nWHERE emp_id IN (SELECT * FROM rpts);<\/pre>\n\n\n\n<p>The <code>WITH<\/code> clause and CTE work just like you saw in other examples. The clause includes a single CTE named <code>rpts<\/code> that retrieves the <code>emp_id<\/code> values for those employees who report to <code>emp_id<\/code> <code>8<\/code>. The query returns the values <code>9<\/code> and <code>10<\/code>.<\/p>\n\n\n\n<p>The top-level <code>UPDATE<\/code> statement uses the data returned by the CTE to update the <code>reports_to<\/code> column to <code>7<\/code> for those two employees. The <code>UPDATE<\/code> statement\u2019s <code>WHERE<\/code> clause includes a subquery that retrieves the data from the CTE, so the statement will update only those two rows.<\/p>\n\n\n\n<p>After you run this update statement, you can rerun the <code>SELECT<\/code> statement from the previous section to verify the changes. I\u2019ve included the statement here for your convenience:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">WITH RECURSIVE emps AS\n  (SELECT emp_id, job_title, reports_to, \n    1 AS emp_tier, CAST(emp_id AS CHAR(50)) AS emp_path\n  FROM airline_emps\n  WHERE reports_to IS NULL\n  UNION ALL\n  SELECT a.emp_id, a.job_title, a.reports_to, \n    e.emp_tier + 1, \n    CONCAT(a.emp_id, ' \/ ', e.emp_path)\n  FROM airline_emps a INNER JOIN emps e\n    ON a.reports_to = e.emp_id)\nSELECT * FROM emps\nORDER BY emp_tier, emp_id;<\/pre>\n\n\n\n<p>The statement returns the results shown in the following figure.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"814\" height=\"470\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/01\/word-image-38.png\" alt=\"\" class=\"wp-image-95281\"\/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>Notice that employees <code>9<\/code> and <code>10<\/code> now show a <code>reports_to<\/code> value of <code>7<\/code>. In addition, the <code>emp_path<\/code> value for each of the two rows has been updated to reflect the new reporting hierarchy.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-working-with-mysql-common-table-expressions\">Working with MySQL common table expressions<\/h2>\n\n\n\n<p>The CTE can be a powerful tool when querying and modifying data in your MySQL databases. Recursive CTEs can be particularly useful when working with self-referencing data, as the earlier examples demonstrated. But CTEs are not always intuitive, and you should have a good understanding of how they work before you start adding them to your current database code, especially if you want to use them to modify data. For this reason, I recommend that you also review the <a href=\"https:\/\/dev.mysql.com\/doc\/refman\/8.0\/en\/with.html\">MySQL documentation<\/a> on CTEs, along with any other resources you have available. The more time you invest upfront in learning about CTEs, the more effectively you\u2019ll be able to utilize them in your MySQL queries.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-appendix-preparing-the-demo-objects-and-data\">Appendix: Preparing the demo objects and data<\/h2>\n\n\n\n<p>For the examples in this article, I created the <code>travel<\/code> database and added the <code>manufacturers<\/code> and <code>airplanes<\/code> to the database. If you want to try out these examples for yourself, start by running the following script on your MySQL instance:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">DROP DATABASE IF EXISTS travel;\nCREATE DATABASE travel;\nUSE travel;\nCREATE TABLE manufacturers (\n  manufacturer_id INT UNSIGNED NOT NULL,\n  manufacturer VARCHAR(50) NOT NULL,\n  create_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,\n  last_update TIMESTAMP NOT NULL \n    DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,\n  PRIMARY KEY (manufacturer_id) );\nCREATE TABLE airplanes (\n  plane_id INT UNSIGNED NOT NULL,\n  plane VARCHAR(50) NOT NULL,\n  manufacturer_id INT UNSIGNED NOT NULL,\n  engine_type VARCHAR(50) NOT NULL,\n  engine_count TINYINT NOT NULL,\n  max_weight MEDIUMINT UNSIGNED NOT NULL,\n  wingspan DECIMAL(5,2) NOT NULL,\n  plane_length DECIMAL(5,2) NOT NULL,\n  parking_area INT GENERATED ALWAYS AS ((wingspan * plane_length)) STORED,\n  icao_code CHAR(4) NOT NULL,\n  create_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,\n  last_update TIMESTAMP NOT NULL \n    DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,\n  PRIMARY KEY (plane_id),\n  CONSTRAINT fk_manufacturer_id FOREIGN KEY (manufacturer_id) \n    REFERENCES manufacturers (manufacturer_id) );<\/pre>\n\n\n\n<p>After you\u2019ve created the database, you can add sample data to the <code>manufacturers<\/code> table by running the following <code>INSERT<\/code> statement, which inserts seven rows:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">INSERT INTO manufacturers (manufacturer_id, manufacturer)\nVALUES (101,'Airbus'), (102,'Beagle Aircraft Limited'), (103,'Beechcraft'), \n  (104,'Boeing'), (105,'Bombardier'), (106,'Cessna'), (107,'Embraer');\nSELECT * FROM manufacturers;<\/pre>\n\n\n\n<p>I included a <code>SELECT<\/code> statement after the <code>INSERT<\/code> statement so you can confirm that the seven rows have been added to the table. The first row in the table has a <code>manufacturer_id<\/code> value of <code>101<\/code>, and the subsequent rows are incremented by one. After you\u2019ve populated the <code>manufacturers<\/code> table, you can run the following <code>INSERT<\/code> statement to add data to the <code>airplanes<\/code> table:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">INSERT INTO airplanes \n  (plane_id, plane, manufacturer_id, engine_type, engine_count, \n    wingspan, plane_length, max_weight, icao_code)\nVALUES\n  (1001,'A340-600',101,'Jet',4,208.17,247.24,837756,'A346'),\n  (1002,'A350-800 XWB',101,'Jet',2,212.42,198.58,546700,'A358'),\n  (1003,'A350-900',101,'Jet',2,212.42,219.16,617295,'A359'),\n  (1004,'A380-800',101,'Jet',4,261.65,238.62,1267658,'A388'),\n  (1005,'A380-843F',101,'Jet',4,261.65,238.62,1300000,'A38F'),\n  (1006,'A.109 Airedale',102,'Piston',1,36.33,26.33,2750,'AIRD'),\n  (1007,'A.61 Terrier',102,'Piston',1,36,23.25,2400,'AUS6'),\n  (1008,'B.121 Pup',102,'Piston',1,31,23.17,1600,'PUP'),\n  (1009,'B.206',102,'Piston',2,55,33.67,7500,'BASS'),\n  (1010,'D.5-108 Husky',102,'Piston',1,36,23.17,2400,'D5'),\n  (1011,'Baron 56 TC Turbo Baron',103,'Piston',2,37.83,28,5990,'BE56'),\n  (1012,'Baron 58 (and current G58)',103,'Piston',2,37.83,29.83,5500,'BE58'),\n  (1013,'Beechjet 400 (same as MU-300-10 Diamond II)',103,'Jet',2,43.5,48.42,15780,'BE40'),\n  (1014,'Bonanza 33 (F33A)',103,'Piston',1,33.5,26.67,3500,'BE33'),\n  (1015,'Bonanza 35 (G35)',103,'Piston',1,32.83,25.17,3125,'BE35'),\n  (1016,'747-8F',104,'Jet',4,224.42,250.17,987000,'B748'),\n  (1017,'747-SP',104,'Jet',4,195.67,184.75,696000,'B74S'),\n  (1018,'757-300',104,'Jet',2,124.83,178.58,270000,'B753'),\n  (1019,'767-200',104,'Jet',2,156.08,159.17,315000,'B762'),\n  (1020,'767-200ER',104,'Jet',2,156.08,159.17,395000,'B762'),\n  (1021,'Learjet 24',105,'Jet',2,35.58,43.25,13000,'LJ24'),\n  (1022,'Learjet 24A',105,'Jet',2,35.58,43.25,12499,'LJ24'),\n  (1023,'Challenger (BD-100-1A10) 350',105,'Jet',2,69,68.75,40600,'CL30'),\n  (1024,'Challenger (CL-600-1A11) 600',105,'Jet',2,64.33,68.42,36000,'CL60'),\n  (1025,'Challenger (CL-600-2A12) 601',105,'Jet',2,64.33,68.42,42100,'CL60'),\n  (1026,'414A Chancellor',106,'Piston',2,44.17,36.42,6750,'C414'),\n  (1027,'421C Golden Eagle',106,'Piston',2,44.17,36.42,7450,'C421'),\n  (1028,'425 Corsair-Conquest I',106,'Turboprop',2,44.17,35.83,8600,'C425'),\n  (1029,'441 Conquest II',106,'Turboprop',2,49.33,39,9850,'C441'),\n  (1030,'Citation CJ1 (Model C525)',106,'Jet',2,46.92,42.58,10600,'C525'),\n  (1031,'EMB 175 LR',107,'Jet',2,85.33,103.92,85517,'E170'),\n  (1032,'EMB 175 Standard',107,'Jet',2,85.33,103.92,82673,'E170'),\n  (1033,'EMB 175-E2',107,'Jet',2,101.67,106,98767,'E170'),\n  (1034,'EMB 190 AR',107,'Jet',2,94.25,118.92,114199,'E190'),\n  (1035,'EMB 190 LR',107,'Jet',2,94.25,118.92,110892,'E190');\nSELECT * FROM airplanes;<\/pre>\n\n\n\n<p>The <code>INSERT<\/code> statement uses the <code>manufacturer_id<\/code> values from the <code>manufacturers<\/code> table. These values provide the foreign key values needed for the <code>manufacturer_id<\/code> column in the <code>airplanes<\/code> table. In addition, the first row is assigned <code>1001<\/code> for the <code>plane_id<\/code> value, with the <code>plane_id<\/code> values for the other rows incremented accordingly. As with the previous <code>INSERT<\/code> statement, I\u2019ve included a <code>SELECT<\/code> statement for confirming that the data has been properly added.<\/p>\n\n\n\n<section id=\"faq\" class=\"faq-block my-5xl\">\n    <h2>FAQs: Introducing the MySQL common table expression<\/h2>\n\n                        <h3 class=\"mt-4xl\">1. What is a MySQL CTE and how do I create one?<\/h3>\n            <div class=\"faq-answer\">\n                <p>A CTE (Common Table Expression) is a temporary, named result set created with the WITH keyword before a SELECT statement: WITH cte_name AS (SELECT id, name FROM products WHERE active = 1) SELECT * FROM cte_name WHERE name LIKE &#8216;A%&#8217;; The CTE (cte_name) is only available within the query that immediately follows the WITH clause. CTEs improve readability over subqueries by giving the intermediate result a descriptive name. Multiple CTEs can be defined in one WITH clause, separated by commas.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">2. What is the difference between a MySQL CTE and a subquery?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Both CTEs and subqueries define temporary result sets used within a query, but CTEs have three advantages: (1) Readability &#8211; a named CTE is easier to understand than an inline subquery, especially when the logic is complex; (2) Reusability &#8211; a CTE can be referenced multiple times within the same query, whereas a subquery must be repeated; (3) Recursive capability &#8211; CTEs support recursive queries using the RECURSIVE keyword, which subqueries cannot do. For simple one-use cases, a subquery is equally valid; for complex queries with repeated logic or hierarchical traversal, CTEs are superior.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">3. How do I write a recursive CTE in MySQL?<\/h3>\n            <div class=\"faq-answer\">\n                <p>Define the CTE with the RECURSIVE keyword and use UNION ALL to combine the anchor member (base case) and the recursive member (the self-referencing part): WITH RECURSIVE category_tree AS (SELECT id, name, parent_id FROM categories WHERE parent_id IS NULL UNION ALL SELECT c.id, c.name, c.parent_id FROM categories c JOIN category_tree ct ON c.parent_id = ct.id) SELECT * FROM category_tree; The anchor member returns root rows; the recursive member joins back to the CTE itself to retrieve children. MySQL applies a default recursion depth limit of 1,000 levels.<\/p>\n            <\/div>\n                    <h3 class=\"mt-4xl\">4. What MySQL version supports CTEs?<\/h3>\n            <div class=\"faq-answer\">\n                <p>CTEs (both regular and recursive) were introduced in MySQL 8.0. They are not available in MySQL 5.7 or earlier. If you need CTE-like functionality in MySQL 5.7, use derived tables (subqueries in the FROM clause) as an alternative &#8211; they produce equivalent results for non-recursive cases. Recursive queries in MySQL 5.7 require stored procedures with iterative logic or application-level tree traversal.<\/p>\n            <\/div>\n            <\/section>\n","protected":false},"excerpt":{"rendered":"<p>Use MySQL common table expressions (CTEs) to simplify complex queries with the WITH clause. Covers basic CTEs, multiple CTEs in one WITH clause, recursive CTEs for hierarchical data, and CTEs with INSERT\/UPDATE\/DELETE.&hellip;<\/p>\n","protected":false},"author":221841,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":true,"footnotes":""},"categories":[53,145792],"tags":[],"coauthors":[6779],"class_list":["post-95263","post","type-post","status-publish","format-standard","hentry","category-featured","category-mysql"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/95263","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/221841"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=95263"}],"version-history":[{"count":5,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/95263\/revisions"}],"predecessor-version":[{"id":109838,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/95263\/revisions\/109838"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=95263"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=95263"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=95263"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=95263"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}