{"id":97259,"date":"2023-07-20T15:53:30","date_gmt":"2023-07-20T15:53:30","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=97259"},"modified":"2023-07-07T15:56:43","modified_gmt":"2023-07-07T15:56:43","slug":"optimizing-queries-in-mysql-optimizing-updates","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/mysql\/optimizing-queries-in-mysql-optimizing-updates\/","title":{"rendered":"Optimizing Queries in MySQL: Optimizing Updates"},"content":{"rendered":"<p>In the previous parts of these MySQL optimization series, we\u2019ve told you how queries work on a high level, then dived deeper into the power of <code>SELECT<\/code> and <code>INSERT<\/code> statements. In this blog, I will cover some of the ways to optimize modifying your data too.<\/p>\n<h2>UPDATE Queries \u2013 the Basics<\/h2>\n<p>As the name of the statement suggests, UPDATE queries provide us with the ability to make changes to our existing data. The <code>UPDATE<\/code> statement modifies the values of rows in a table and in its most basic form, looks like so:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">UPDATE demo_table \r\nSET [column] = [\u2018value\u2019] \r\nWHERE [specific details] [LIMIT x]<\/pre>\n<p>Where:<\/p>\n<ul>\n<li>The column after the <code>SET<\/code> option defines the specific column we want to update. It\u2019s worth noting that we can update the values of multiple columns as well by specifying them after we specify the value of the first column.<\/li>\n<li><code>[specific details]<\/code> refer to the details after the <code>WHERE<\/code> clause. This part of the query is mostly used to update a specific set of columns matching a given condition, for example <code>WHERE id &gt; 500<\/code> would update all of the rows with an ID higher than 500. These conditions are known as the predicate of the query.<\/li>\n<li><code>[LIMIT x]<\/code> allows us to update only the specified number of rows (<code>LIMIT 100<\/code> would only update 100 rows, 200 would update 200 rows, etc.) We can also specify an offset to start from: <code>LIMIT 100,200<\/code> would update 100 rows starting from the 100th row. An <code>ORDER<\/code> BY option will order the results according to a given clause, add an <code>ASC<\/code>|<code>DESC<\/code> option and you will be able to sort the rows in an ascending or a descending order.<\/li>\n<\/ul>\n<p>These are the basics of the <code>UPDATE<\/code> statement \u2013 as you can tell, these queries are nothing fancy, but their performance can be made better by following a couple of tips.<\/p>\n<h2>Optimizing UPDATE Queries<\/h2>\n<p>Most MySQL DBAs know that indexes make <code>UPDATE <\/code>statements slow \u2013 however, while this statement is true, there\u2019s much more to updating data than just that. When updating data, we need to keep the following things in mind:<\/p>\n<ul>\n<li><strong>Indexing<\/strong> \u2013 Indexes have positive and negative effects on <code>UPDATE<\/code> statements. This is because an <code>UPDATE<\/code> is logically a two-step process. One to find the row to modify, the another to modify the data.\n<ul>\n<li><strong>Positive <\/strong>\u2013 At the same time, if you are updating a small number of rows (most <code>UPDATE<\/code> statements only affect 1 row, usually accessed by a primary key indexed value), indexes improve performance by improving access to the rows that need to be modified.<\/li>\n<li><strong>Negative<\/strong> &#8211; Indexes can slow <code>UPDATE<\/code> statements down when you modify columns that are indexed. This is because when an index is in place and an <code>UPDATE<\/code> query is running, the <code>UPDATE<\/code> needs to update the data in the index <em>and<\/em> the data itself \u2013 that\u2019s some additional overhead right then and there.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Partitions<\/strong> \u2013 as a rule, partitions slow down <code>UPDATE<\/code> and <code>DELETE<\/code> operations making <code>SELECT<\/code> statements faster in return, and they also have a couple of caveats unique to themselves:\n<ul>\n<li><strong>Partitioning integers<\/strong> \u2013 Suppose that we partition tables by range and a partition A holds integers that are less than 1000, and a partition B holds integers from 1001 to 2000. Then, suppose that we update our data with a query like so: <br \/>\n<code>UPDATE `demo_table` <br \/>\nSET `partitioned_column` = 1500<\/code>; <br \/>\nSuch a query means that if the values in the column originally resided in the partition A, they would now be located in the partition B \u2013 <em>the update would switch the partitioned column from partition A to partition B.<\/em> For some, that may be a source of confusion, so it\u2019s useful to keep that in mind.<\/li>\n<li><strong>Beware of NULL<\/strong> values\u2013 if you insert a row having the value of <code>NULL<\/code> into a specific partition and then want to update your partitions, beware that the row will reside in the lowest partition possible. For more information on how partitioning handles <code>NULL<\/code> values, <a href=\"https:\/\/dev.mysql.com\/doc\/refman\/5.7\/en\/partitioning-handling-nulls.html\">refer to the documentation<\/a> on handling NULL values in partitioning.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Locking<\/strong> \u2013 if we lock the table, perform many updates one by one (we can use the <code>LIMIT<\/code> or <code>WHERE<\/code> clauses to achieve this goal), and then unlock the table, the speed of the queries will likely be much faster than running a single <code>UPDATE<\/code> query that updates many rows at once. Such an approach would look like this (replace <em>x<\/em> with the name of your <em>table<\/em> and <em>column<\/em> with the name of your column. Feel free to update as many rows in one go as you want):<\/li>\n<\/ul>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">LOCK TABLE [WRITE|READ] x;\r\n\r\nUPDATE x SET column = \u2018value\u2019 LIMIT 0,50000;\r\nUPDATE x SET column = \u2018value\u2019 LIMIT 50000,100000;\r\nUPDATE x SET column = \u2018value\u2019 LIMIT 100000,150000;\r\n...\r\nUNLOCK TABLE x;<\/pre>\n<p>It is also worth noting that locks have two types &#8211; one can either lock the table for writing (<code>INSERT<\/code> queries) or for reading (<code>SELECT<\/code> queries) if clauses <code>WRITE<\/code> or <code>READ<\/code> are specified. Locking tables for reads or writes means just that \u2013 all <code>INSERT<\/code> or <code>SELECT<\/code> operations will fail to complete if a specific (writing or reading) lock is in place and everything will be OK once the lock is released (once the table is unlocked.)<\/p>\n<p>Also, keep in mind that if you use the MyISAM engine and use <code>LOCK TABLE<\/code> queries in the same fashion, your <code>UPDATE<\/code> statements will be faster because the key cache (MyISAM\u2019s equivalent of the InnoDB buffer pool) will only be flushed after all of the <code>UPDATE<\/code> queries are completed.<\/p>\n<ul>\n<li><code>WHERE<\/code><strong> and <\/strong><code>LIMIT<\/code> \u2013 the <code>WHERE<\/code> and (or) <code>LIMIT<\/code> clauses can dramatically speed up the performance of <code>UPDATE<\/code> queries too since they would limit the number of rows that would be updated. Always remember to specify them if you don\u2019t need to update the entire table.<\/li>\n<li><strong>The load of your database<\/strong> \u2013 finally, beware of the usage and load of your database before updating data. Updating data at 1AM at night will likely be a better option than updating it at peak usage times if you\u2019re updating data in a live environment. The aim of this approach is to ensure that the database has to put in as little effort as possible.<\/li>\n<\/ul>\n<p>If you come across a situation where you need to update bigger sets of data (millions of rows and above), it\u2019s also a good idea to keep additional tips in mind. We walk you through these below.<\/p>\n<h2>Optimizing UPDATE Queries for Bigger Data Sets<\/h2>\n<p>If you\u2019re updating a column and have a lot of data to update, there\u2019s a neat workaround you can employ by making use of a <code>DEFAULT<\/code> clause. Perform these steps (replace <code>demo_table<\/code> with your table name and <strong>only include columns that you want to keep the data from in the new table.<\/strong>):<\/p>\n<ol>\n<li>Issue a query such as: <br \/>\n<code>SELECT username,email,registration_date <\/code><br \/>\n<code>FROM demo_table <\/code><br \/>\n<code>INTO OUTFILE \u2018\/tmp\/backup.txt\u2019;<\/code> <br \/>\nThis will take a backup that doesn\u2019t come with much overhead when re-importing (<code>SELECT INTO OUTFILE<\/code> statements remove the clutter that comes with <code>INSERT<\/code> queries and only backs up raw data \u2013 you might need to use <code>FIELDS TERMINATED TO<\/code> to specify a delimiter that tells MySQL where one column ends and the other begins. <a href=\"https:\/\/dev.mysql.com\/doc\/refman\/8.0\/en\/load-data.html\">Refer to the documentation<\/a> for more information.)<\/li>\n<li>Create a table identical to the one you\u2019re working with by running a <code>SHOW CREATE TABLE<\/code> statement, then setting the default value of a column you need to update \u2013 here we set the default value of the column ip_address: <br \/>\n<img loading=\"lazy\" decoding=\"async\" width=\"891\" height=\"240\" class=\"wp-image-97260\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/07\/word-image-97259-1.png\" \/><\/li>\n<li>Re-import the partial backup without specifying the column with the default value inside of the operation (it will be populated by default.) Specify <code>IGNORE<\/code> if there are more fields in the file than the amount of columns in the database and specify a <code>FIELDS TERMINATED BY<\/code> option at the end of the query if the fields are terminated by a character too: <br \/>\n<code>LOAD DATA INFILE \u2018\/tmp\/backup.txt\u2019 IGNORE <br \/>\nINTO TABLE users (username,email,registration_date);<br \/>\n<\/code> Such an approach is exceptionally useful for cases involving huge amounts of data that need to be updated to the same value: this operation will always be significantly faster than running an <code>UPDATE<\/code> query because that way MySQL will think that the column is already populated by default. Additionally, as <code>LOAD DATA INFILE<\/code> is able to skip certain operations involving overhead, millions of rows can be updated in a matter of seconds. For more details, refer to the documentation of MySQL or our coverage of <code>LOAD DATA INFILE<\/code>.<\/li>\n<\/ol>\n<h2>ALTER vs. UPDATE in MySQL<\/h2>\n<p>The advice above should be a good starting point for those of you who frequently update data \u2013 however, always keep in mind that as MySQL develops, new issues arise too. For example, did you know that even though <code>ALTER<\/code> and <code>UPDATE<\/code> are two different queries, one of them updates data, but another one updates structures, including columns (changes their data types, names, etc.)?<\/p>\n<p>That\u2019s why you need to familiarize yourself with <code>ALTER<\/code> too: for frequent readers of this blog this query won\u2019t cause many problems, but those who aren\u2019t too familiar with its internals should always keep in mind that when <code>ALTER<\/code> is being run, <em>copies of the table are being made in the background<\/em>. <code>ALTER<\/code> works like this (for the purposes of this example, we will assume the table that is being modified is called \u201c<code>A<\/code>\u201d and a new table is called \u201c<code>B<\/code>\u201d):<\/p>\n<ul>\n<li>A copy of the table <code>A<\/code> is made \u2013 it\u2019s called table <code>B<\/code>.<\/li>\n<li>All data within the table <code>A<\/code> is copied into table <code>B<\/code>.<\/li>\n<li>All of the necessary operations (changes) are performed on table <code>B<\/code>.<\/li>\n<li>The table <code>B<\/code> is switched with the table <code>A<\/code>.<\/li>\n<li>Table <code>B<\/code> is destroyed<\/li>\n<\/ul>\n<p>For some, the steps defined above are the primary reason behind the struggle around storage \u2013 some developers are quick to point out that they\u2019re \u201crunning out of space for no apparent reason\u201d when an <code>ALTER<\/code> query is in progress \u2013 that\u2019s the reason why.<\/p>\n<h2>UPDATE Queries in MyISAM<\/h2>\n<p>It\u2019s widely known that MyISAM is obsolete, but if you find yourself using MyISAM instead of InnoDB or XtraDB for your <code>COUNT(*)<\/code> queries (MyISAM stores the row count inside of its metadata which isn\u2019t the same for InnoDB), look into the following settings:<\/p>\n<ul>\n<li><code>key_buffer_size<\/code> is the equivalent to <code>innodb_buffer_pool_size<\/code> \u2013 the bigger this value is, the faster <code>UPDATE<\/code> statements will finish.<\/li>\n<li>If you find yourself using <code>LOAD DATA INFILE<\/code> to update data, look into the <code>bulk_insert_buffer_size<\/code> parameter. Since MyISAM uses this parameter to make <code>LOAD DATA INFILE<\/code> faster, it\u2019s recommended you increase its size. This parameter is also used whenever ALTER statements are being run. As we\u2019ve already explained above, such statements are not exactly UPDATE queries, but they can be used to update column names instead of data within them.<\/li>\n<\/ul>\n<p>If you must run <code>UPDATE<\/code> operations on MyISAM, you should be wary of updating rows to a value larger than their specified length since <a href=\"https:\/\/dev.mysql.com\/doc\/refman\/8.0\/en\/update-optimization.html\">MySQL documentation<\/a> states that doing so may split the row and would require you to occasionally run <code>OPTIMIZE TABLE<\/code> queries. Finally, consider switching your storage engine to InnoDB or XtraDB as MyISAM is only a fit for running simple <code>COUNT(*)<\/code> queries \u2013 since the storage engine holds the number of rows inside of its metadata, it can return results quickly, while InnoDB cannot.<\/p>\n<h2>Summary<\/h2>\n<p>At the end of the day, updating data isn\u2019t rocket science \u2013 MySQL documentation states that all update statements can be optimized just like <code>SELECT<\/code> queries just with an overhead of a write, plus additional writes if you modify indexed columns. All of the advice given above resonates with that \u2013 indexes slow down <code>UPDATE<\/code> statements, retrieving data when the database isn\u2019t at its peak is a good decision, delaying updates and performing many updates in a row later on is a way to go too, and <code>LOAD DATA INFILE<\/code> is significantly faster than <code>INSERT INTO<\/code> as well.<\/p>\n<p><a id=\"post-97259-_heading=h.gjdgxs\"><\/a> Optimizing <code>UPDATE<\/code> operations in MySQL may not be a piece of cake \u2013 but it isn\u2019t rocket science either. Some basic knowledge does indeed go a long way \u2013 familiarize yourself with the internals and the advice above, <a href=\"https:\/\/dev.mysql.com\/doc\/refman\/5.7\/en\/update-optimization.html\">read the documentation<\/a>, and your database should be well on its way to quickly perform <code>UPDATE<\/code> operations.<\/p>\n<p><a id=\"post-97259-_heading=h.oza3egje5q5b\"><\/a> However, simple <code>UPDATEs<\/code> never saved a database from disaster \u2013 read up on other articles in these series and learn how to improve the speed of <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/mysql\/optimizing-queries-in-mysql-optimizing-reads\/\">SELECT<\/a>, <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/mysql\/optimizing-mysql-adding-data-to-tables\/\">INSERT<\/a>, and <code>DELETE<\/code> queries, and until next time.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the previous parts of these MySQL optimization series, we\u2019ve told you how queries work on a high level, then dived deeper into the power of SELECT and INSERT statements. In this blog, I will cover some of the ways to optimize modifying your data too. UPDATE Queries \u2013 the Basics As the name of&#8230;&hellip;<\/p>\n","protected":false},"author":339547,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[53,145792],"tags":[],"coauthors":[146040],"class_list":["post-97259","post","type-post","status-publish","format-standard","hentry","category-featured","category-mysql"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/97259","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/339547"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=97259"}],"version-history":[{"count":2,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/97259\/revisions"}],"predecessor-version":[{"id":97262,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/97259\/revisions\/97262"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=97259"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=97259"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=97259"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=97259"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}