In the previous parts of these MySQL optimization series, we’ve told you how queries work on a high level, then dived deeper into the power of SELECT
and INSERT
statements. In this blog, I will cover some of the ways to optimize modifying your data too.
UPDATE Queries – the Basics
As the name of the statement suggests, UPDATE queries provide us with the ability to make changes to our existing data. The UPDATE
statement modifies the values of rows in a table and in its most basic form, looks like so:
1 2 3 |
UPDATE demo_table SET [column] = [‘value’] WHERE [specific details] [LIMIT x] |
Where:
- The column after the
SET
option defines the specific column we want to update. It’s worth noting that we can update the values of multiple columns as well by specifying them after we specify the value of the first column. [specific details]
refer to the details after theWHERE
clause. This part of the query is mostly used to update a specific set of columns matching a given condition, for exampleWHERE id > 500
would update all of the rows with an ID higher than 500. These conditions are known as the predicate of the query.[LIMIT x]
allows us to update only the specified number of rows (LIMIT 100
would only update 100 rows, 200 would update 200 rows, etc.) We can also specify an offset to start from:LIMIT 100,200
would update 100 rows starting from the 100th row. AnORDER
BY option will order the results according to a given clause, add anASC
|DESC
option and you will be able to sort the rows in an ascending or a descending order.
These are the basics of the UPDATE
statement – as you can tell, these queries are nothing fancy, but their performance can be made better by following a couple of tips.
Optimizing UPDATE Queries
Most MySQL DBAs know that indexes make UPDATE
statements slow – however, while this statement is true, there’s much more to updating data than just that. When updating data, we need to keep the following things in mind:
- Indexing – Indexes have positive and negative effects on
UPDATE
statements. This is because anUPDATE
is logically a two-step process. One to find the row to modify, the another to modify the data.- Positive – At the same time, if you are updating a small number of rows (most
UPDATE
statements only affect 1 row, usually accessed by a primary key indexed value), indexes improve performance by improving access to the rows that need to be modified. - Negative – Indexes can slow
UPDATE
statements down when you modify columns that are indexed. This is because when an index is in place and anUPDATE
query is running, theUPDATE
needs to update the data in the index and the data itself – that’s some additional overhead right then and there.
- Positive – At the same time, if you are updating a small number of rows (most
- Partitions – as a rule, partitions slow down
UPDATE
andDELETE
operations makingSELECT
statements faster in return, and they also have a couple of caveats unique to themselves:- Partitioning integers – Suppose that we partition tables by range and a partition A holds integers that are less than 1000, and a partition B holds integers from 1001 to 2000. Then, suppose that we update our data with a query like so:
UPDATE `demo_table`
;
SET `partitioned_column` = 1500
Such a query means that if the values in the column originally resided in the partition A, they would now be located in the partition B – the update would switch the partitioned column from partition A to partition B. For some, that may be a source of confusion, so it’s useful to keep that in mind. - Beware of NULL values– if you insert a row having the value of
NULL
into a specific partition and then want to update your partitions, beware that the row will reside in the lowest partition possible. For more information on how partitioning handlesNULL
values, refer to the documentation on handling NULL values in partitioning.
- Partitioning integers – Suppose that we partition tables by range and a partition A holds integers that are less than 1000, and a partition B holds integers from 1001 to 2000. Then, suppose that we update our data with a query like so:
- Locking – if we lock the table, perform many updates one by one (we can use the
LIMIT
orWHERE
clauses to achieve this goal), and then unlock the table, the speed of the queries will likely be much faster than running a singleUPDATE
query that updates many rows at once. Such an approach would look like this (replace x with the name of your table and column with the name of your column. Feel free to update as many rows in one go as you want):
1 2 3 4 5 6 7 |
LOCK TABLE [WRITE|READ] x; UPDATE x SET column = ‘value’ LIMIT 0,50000; UPDATE x SET column = ‘value’ LIMIT 50000,100000; UPDATE x SET column = ‘value’ LIMIT 100000,150000; ... UNLOCK TABLE x; |
It is also worth noting that locks have two types – one can either lock the table for writing (INSERT
queries) or for reading (SELECT
queries) if clauses WRITE
or READ
are specified. Locking tables for reads or writes means just that – all INSERT
or SELECT
operations will fail to complete if a specific (writing or reading) lock is in place and everything will be OK once the lock is released (once the table is unlocked.)
Also, keep in mind that if you use the MyISAM engine and use LOCK TABLE
queries in the same fashion, your UPDATE
statements will be faster because the key cache (MyISAM’s equivalent of the InnoDB buffer pool) will only be flushed after all of the UPDATE
queries are completed.
WHERE
andLIMIT
– theWHERE
and (or)LIMIT
clauses can dramatically speed up the performance ofUPDATE
queries too since they would limit the number of rows that would be updated. Always remember to specify them if you don’t need to update the entire table.- The load of your database – finally, beware of the usage and load of your database before updating data. Updating data at 1AM at night will likely be a better option than updating it at peak usage times if you’re updating data in a live environment. The aim of this approach is to ensure that the database has to put in as little effort as possible.
If you come across a situation where you need to update bigger sets of data (millions of rows and above), it’s also a good idea to keep additional tips in mind. We walk you through these below.
Optimizing UPDATE Queries for Bigger Data Sets
If you’re updating a column and have a lot of data to update, there’s a neat workaround you can employ by making use of a DEFAULT
clause. Perform these steps (replace demo_table
with your table name and only include columns that you want to keep the data from in the new table.):
- Issue a query such as:
SELECT username,email,registration_date
FROM demo_table
INTO OUTFILE ‘/tmp/backup.txt’;
This will take a backup that doesn’t come with much overhead when re-importing (SELECT INTO OUTFILE
statements remove the clutter that comes withINSERT
queries and only backs up raw data – you might need to useFIELDS TERMINATED TO
to specify a delimiter that tells MySQL where one column ends and the other begins. Refer to the documentation for more information.) - Create a table identical to the one you’re working with by running a
SHOW CREATE TABLE
statement, then setting the default value of a column you need to update – here we set the default value of the column ip_address:
- Re-import the partial backup without specifying the column with the default value inside of the operation (it will be populated by default.) Specify
IGNORE
if there are more fields in the file than the amount of columns in the database and specify aFIELDS TERMINATED BY
option at the end of the query if the fields are terminated by a character too:
LOAD DATA INFILE ‘/tmp/backup.txt’ IGNORE
Such an approach is exceptionally useful for cases involving huge amounts of data that need to be updated to the same value: this operation will always be significantly faster than running an
INTO TABLE users (username,email,registration_date);
UPDATE
query because that way MySQL will think that the column is already populated by default. Additionally, asLOAD DATA INFILE
is able to skip certain operations involving overhead, millions of rows can be updated in a matter of seconds. For more details, refer to the documentation of MySQL or our coverage ofLOAD DATA INFILE
.
ALTER vs. UPDATE in MySQL
The advice above should be a good starting point for those of you who frequently update data – however, always keep in mind that as MySQL develops, new issues arise too. For example, did you know that even though ALTER
and UPDATE
are two different queries, one of them updates data, but another one updates structures, including columns (changes their data types, names, etc.)?
That’s why you need to familiarize yourself with ALTER
too: for frequent readers of this blog this query won’t cause many problems, but those who aren’t too familiar with its internals should always keep in mind that when ALTER
is being run, copies of the table are being made in the background. ALTER
works like this (for the purposes of this example, we will assume the table that is being modified is called “A
” and a new table is called “B
”):
- A copy of the table
A
is made – it’s called tableB
. - All data within the table
A
is copied into tableB
. - All of the necessary operations (changes) are performed on table
B
. - The table
B
is switched with the tableA
. - Table
B
is destroyed
For some, the steps defined above are the primary reason behind the struggle around storage – some developers are quick to point out that they’re “running out of space for no apparent reason” when an ALTER
query is in progress – that’s the reason why.
UPDATE Queries in MyISAM
It’s widely known that MyISAM is obsolete, but if you find yourself using MyISAM instead of InnoDB or XtraDB for your COUNT(*)
queries (MyISAM stores the row count inside of its metadata which isn’t the same for InnoDB), look into the following settings:
key_buffer_size
is the equivalent toinnodb_buffer_pool_size
– the bigger this value is, the fasterUPDATE
statements will finish.- If you find yourself using
LOAD DATA INFILE
to update data, look into thebulk_insert_buffer_size
parameter. Since MyISAM uses this parameter to makeLOAD DATA INFILE
faster, it’s recommended you increase its size. This parameter is also used whenever ALTER statements are being run. As we’ve already explained above, such statements are not exactly UPDATE queries, but they can be used to update column names instead of data within them.
If you must run UPDATE
operations on MyISAM, you should be wary of updating rows to a value larger than their specified length since MySQL documentation states that doing so may split the row and would require you to occasionally run OPTIMIZE TABLE
queries. Finally, consider switching your storage engine to InnoDB or XtraDB as MyISAM is only a fit for running simple COUNT(*)
queries – since the storage engine holds the number of rows inside of its metadata, it can return results quickly, while InnoDB cannot.
Summary
At the end of the day, updating data isn’t rocket science – MySQL documentation states that all update statements can be optimized just like SELECT
queries just with an overhead of a write, plus additional writes if you modify indexed columns. All of the advice given above resonates with that – indexes slow down UPDATE
statements, retrieving data when the database isn’t at its peak is a good decision, delaying updates and performing many updates in a row later on is a way to go too, and LOAD DATA INFILE
is significantly faster than INSERT INTO
as well.
Optimizing UPDATE
operations in MySQL may not be a piece of cake – but it isn’t rocket science either. Some basic knowledge does indeed go a long way – familiarize yourself with the internals and the advice above, read the documentation, and your database should be well on its way to quickly perform UPDATE
operations.
However, simple UPDATEs
never saved a database from disaster – read up on other articles in these series and learn how to improve the speed of SELECT, INSERT, and DELETE
queries, and until next time.
Load comments