{"id":96784,"date":"2023-05-22T14:59:49","date_gmt":"2023-05-22T14:59:49","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=96784"},"modified":"2023-05-22T14:59:49","modified_gmt":"2023-05-22T14:59:49","slug":"optimizing-mysql-adding-data-to-tables","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/mysql\/optimizing-mysql-adding-data-to-tables\/","title":{"rendered":"Optimizing MySQL: Adding Data to Tables"},"content":{"rendered":"<p>Welcome back to the MySQL optimization series! In case you haven\u2019t been following this series, in the past couple of articles we have discussed the basics of query optimization, and told you <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/mysql\/optimizing-queries-in-mysql-optimizing-reads\/\">how to optimize SELECT queries<\/a> for performance as well.<\/p>\n<p>In this blog, we\u2019re further learning ways to optimize <code>INSERT<\/code> operations and look at alternatives when you need to load more than a few rows in the <code>LOAD DATA INFILE<\/code> statement.<\/p>\n<h2>How Does INSERT Work?<\/h2>\n<p>As the name suggests, <code>INSERT<\/code> is the query that\u2019s used to create data in tables in a database. The internal functioning of a basic <code>INSERT<\/code> query is pretty much identical to that of a<code>SELECT<\/code>statement as seen in the previous article, first, the database checks for permissions, opens tables, initializes operations, checks whether rows need to be locked, inserts data (updates the indexes), and then finishes coming to a stop.<\/p>\n<p>We can back this up by performing some profiling:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1476\" height=\"391\" class=\"wp-image-96785\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/05\/d-redgate-blogs-october-2022-optimiziation-series.png\" alt=\"D:\\Redgate\\Blogs\\October 2022\\Optimiziation Series\\Optimizing INSERT Queries\\Images\\INSERT Profiling - 1.png\" \/><\/p>\n<p><em>Image 1 &#8211; INSERT Query Profiling \u2013 the Query<\/em><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"409\" height=\"422\" class=\"wp-image-96786\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/05\/d-redgate-blogs-october-2022-optimiziation-series-1.png\" alt=\"D:\\Redgate\\Blogs\\October 2022\\Optimiziation Series\\Optimizing INSERT Queries\\Images\\INSERT Profiling - 2.png\" \/><\/p>\n<p><em>Image 2 &#8211; INSERT Query Profiling \u2013 Status and Duration<\/em><\/p>\n<p>To know what these status codes mean in detail, please refer to<a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/mysql\/optimizing-queries-in-mysql-optimizing-reads\/\"> preview blog about SELECT queries where we explain them one by one<\/a>, but in a nutshell, the profiling codes outlined above enable MySQL to figure out the following things, amongst others:<\/p>\n<ul>\n<li>Whether appropriate privileges have been set.<\/li>\n<li>Whether tables are ready to perform any operations on (i.e., whether they are locked or not.)<\/li>\n<li>How best to perform the <code>INSERT<\/code> operation. This step includes*:\n<ul>\n<li>Sending the query to the server.<\/li>\n<li>Parsing the query.<\/li>\n<li>Scanning the table for any indexes or partitions.<\/li>\n<li>Inserting rows into tables<\/li>\n<li>Adding data to applicable indexes.<\/li>\n<li>Closing the process.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>* Do note that each step takes up a proportion of the total time <code>INSERT<\/code> queries take. <a href=\"https:\/\/dev.mysql.com\/doc\/refman\/8.0\/en\/insert-optimization.html\">According to the MySQL<\/a> documentation, the fastest process is the closing of the query, the second fastest one pertains to sending the query to the server and parsing it, and the slowest is adding data to the table. (Note that for very small amounts of data, connecting to the server can be more costly).<\/p>\n<p>As far as profiling is concerned, it\u2019s indeed helpful when performing read operations, but not very much in other cases: knowing how queries work is a good thing to do, but when optimizing INSERT queries, profiling doesn\u2019t take us far and that\u2019s why we need to employ techniques that we\u2019ll share with you in a second.<\/p>\n<p>Another thing to remember is that no matter what database management system is in use (<code>INNODB<\/code>, etc.), many queries go through the same steps \u2013 <code>INSERT<\/code> queries share many steps with SELECT queries too, however, while there are a lot of shared steps, it\u2019s important to remember that all queries are different in their own regard too \u2013 <strong>an <code>INSERT<\/code> query is different from a <code>SELECT<\/code> query in that <code>SELECT<\/code> queries benefit from indexes and partitions, while they generally make <code>INSERT<\/code> queries <\/strong><strong>slower.<\/strong><\/p>\n<p>Advanced DBAs will also know that <code>INSERT<\/code> queries are sometimes used in concert with <code>SELECT<\/code> queries \u2013 we\u2019ll start from the basic ways that will help you improve your <code>INSERT<\/code> query performance, then gradually evolve towards more complex scenarios.<\/p>\n<h2>Basic Ways to Improve INSERT Query Performance<\/h2>\n<p>To improve <code>INSERT<\/code> query performance, start with the advice above: profile your queries, then remove indexes or partitions on a table if necessary. All indexes and partitions will make all <code>INSERT<\/code> queries slower because of one simple reason \u2013 every time data is inserted into a table, all indexes and partitions on that table must be updated for the database to know where the row resides. That\u2019s true no matter what kind of partitions or indexes are in use \u2013 the more partitions or indexes exist on a table, the slower your query will become.<\/p>\n<p><em>Note: Remember that performance tuning must be treated as a holistic activity and we are focusing on tuning<code>INSERT<\/code> statements only here. Indexes are needed by most of your SQL statements and unless you are adding a LOT of data to your table, removing them just to make a particular <code>INSERT<\/code> operations go a little bit faster (and then killing read performance) is typically not desirable.<\/em><\/p>\n<p>Another very popular and simple way to improve query performance is to combine small operations into a single, larger operation. We can also lock tables and unlock tables only when all operations are completed, and these approaches look like so:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1916\" height=\"99\" class=\"wp-image-96787\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/05\/d-misc-insert-query-optimization-screenshot_1-png.png\" alt=\"D:\\Misc\\INSERT Query Optimization\\Screenshot_1.png\" \/><\/p>\n<p><em>Image 3 &#8211; a basic way to improve INSERT query performance.<\/em><\/p>\n<p>And that\u2019s the crux of it \u2013 to optimize <code>INSERT<\/code> query speed, we need to \u201cthink big\u201d in terms of queries. The whole magic goes like this: <em>instead of having many queries that insert a couple of rows each, run one huge query that inserts many rows at once after making sure that all the table locks, index updates, and consistency checking are as delayed as possible (ideally delay these processes until the very end of the query.)<\/em><\/p>\n<p>When it is feasible, delaying everything isn\u2019t hard and can be done by following some or all these steps. Consider performing these steps if you\u2019re working with more than a million rows and (or) whenever you see that your <code>INSERT<\/code> queries can be combined into a single operation just like in the example above (it\u2019s not necessary to follow the steps from the top to bottom \u2013 following one or two steps will usually be a good start):<\/p>\n<ol>\n<li>Lock the tables before inserting data into them and unlock them once all the data has been inserted, but not before (see example above.) By locking your tables, you will ensure that the data within them will not be modified while the data will be inserted.<\/li>\n<li>Consider starting a transaction (<code>START TRANSACTION<\/code>) before running <code>INSERT<\/code> queries \u2013 once done, issue a <code>COMMIT<\/code> query.<\/li>\n<li>Avoid adding any indexes before insert operations have been completed.<\/li>\n<li>If possible, make sure that the table you\u2019re inserting data into is not partitioned because partitioning splits your table into sub-tables and once you insert data into your table, MySQL has to go through a couple of additional steps to figure out which partition to insert what data into, etc. Also, while the proper use of partitions help improve the performance of <code>SELECT<\/code> statements, partitioned tables will inevitably take up a little more space on the disk, so be wary of that.<\/li>\n<\/ol>\n<p>Follow these steps and that\u2019s it \u2013 you\u2019re on the way to <code>INSERT<\/code> heaven! However, do note that these are only the basics and such an approach may not cover your scenario \u2013 in that case, we may need to perform some additional optimization by allocating the number of I\/O threads within <code>InnoDB<\/code> by modifying <code>my.cnf<\/code> (refer to the options in the screenshot below), or improve query performance by doing other things you will learn in this article. Keep reading \u2013 your <code>INSERT<\/code> query performance will soon skyrocket!<\/p>\n<h2>Improving INSERT Query Performance Beyond the Basics<\/h2>\n<p>Once you\u2019re sure that all of your <code>INSERT<\/code> queries are profiled and doing what you expect, employ the basic steps outlined above to improve their performance. If that didn\u2019t help, consider the following advice:<\/p>\n<ul>\n<li>We can start an explicit transaction using <code>START TRANSACTION<\/code> or issue <code>autocommit=0<\/code>, run all of our <code>INSERT<\/code> queries, then run a <code>COMMIT<\/code> query. <em>By doing so we delay commit operations until after the very last <code>INSERT<\/code> query has been finished<\/em> \u2013 in that case, MySQL saves time because it doesn\u2019t commit SQL statements as soon as they\u2019re executed, but commits them all at once later instead. The bigger the data set is, the more time will be saved. <br \/>\nSuch an approach isn\u2019t very feasible with billions of rows (<code>INSERT<\/code> queries are not designed for more than 50 or 100 million rows at a time &#8211; somewhat depends on hardware in use &#8211; and we need to use <code>LOAD DATA<\/code> for that since <code>INSERT<\/code> statements come with a lot of overhead \u2013 more on that later), but if we\u2019re dealing not dealing with that many rows, it certainly could be viable.<\/li>\n<li>We can modify the <code>my.cnf<\/code> file and add or modify the following options to increase the I\/O capabilities of <code>InnoDB<\/code> (the main storage engine within MySQL): <br \/>\n<img loading=\"lazy\" decoding=\"async\" width=\"447\" height=\"106\" class=\"wp-image-96788\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/05\/d-redgate-blogs-october-2022-optimiziation-series-2.png\" alt=\"D:\\Redgate\\Blogs\\October 2022\\Optimiziation Series\\Optimizing INSERT Queries\\Images\\InnoDB IO.png\" \/><br \/>\n<em>Image 3 &#8211; Increasing I\/O Capabilities of InnoDB<\/em><\/p>\n<ul>\n<li>The <code>innodb_read_io_threads<\/code> option sets the number of threads handling read operations \u2013 in most cases, the value of 32 is sufficient and should be left at default.<\/li>\n<li>The <code>innodb_write_io_threads<\/code> option sets the number of threads handling write operations \u2013 in most cases, the value of 32 is also sufficient and should be left at default.<\/li>\n<li>The <code>innodb_io_capacity<\/code> option defines the total I\/O capacity of InnoDB and <em>its value should be set at the maximum value of IOPS available to the server<\/em> <br \/>\nIt is mostly related to the threads that perform various database-related tasks in the background (working with the buffer pool, writing changes, etc.) \u2013 these threads are trying to not have a negative effect on InnoDB at the same time. for more information, head over to the MySQL documentation over <a href=\"https:\/\/dev.mysql.com\/doc\/refman\/8.0\/en\/innodb-configuring-io-capacity.html\">here<\/a>.<\/li>\n<li>For more details on optimizing <code>my.cnf<\/code> for general performance, I previously posted the following article on Simple-Talk: <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/mysql\/optimizing-my-cnf-for-mysql-performance\/\">Optimizing my.cnf for MySQL performance<\/a>.<\/li>\n<\/ul>\n<\/li>\n<li>We can insert more data at once by using the bulk insert operation available within the INSERT statement itself. When using such an approach, we can avoid defining the columns we load the data into if we load the data into all of them at once as well. Do note that if we elect to ignore columns and we have an <code>AUTO_INCREMENT<\/code> column, we must specify the value as NULL (an example is given below.)<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1919\" height=\"189\" class=\"wp-image-96789\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/05\/d-misc-insert-query-optimization-screenshot_2-png.png\" alt=\"D:\\Misc\\INSERT Query Optimization\\Screenshot_2.png\" \/><\/p>\n<p><em>Image 4 \u2013 INSERT INTO Example<\/em><\/p>\n<h2>Additional considerations for INSERT performance<\/h2>\n<p>These steps previously covered will help you optimize your <code>INSERT<\/code> queries; however, these are only the basics. Advanced DBAs know of a couple of additional ways to improve their <code>INSERT<\/code> query performance and some of advanced tips include combining <code>INSERT<\/code> statements with <code>SELECT<\/code>statements too!<\/p>\n<h3>Locks and Inserting rows concurrently<\/h3>\n<p>Many of you know that issuing <code>INSERT<\/code> statements means locks for <code>InnoDB<\/code>: MySQL deals with each statement differently, and as far as <code>INSERT<\/code> statements are concerned, <code>InnoDB<\/code> row-level locks are advantageous for the end-user: the storage engine sets a lock on the inserted row thus not disrupting any operations with any of the other rows in the same table. Users can still work with their <code>InnoDB<\/code>-based tables as they\u2019re inserting data as long as the rows they need are not locked and they didn\u2019t take a table lock).<\/p>\n<p>Before any row is inserted, MySQL also lets <code>InnoDB<\/code> know that a row is going to be inserted by setting an insert intention lock. The purpose of that lock is to let other transactions know that data is inserted into a certain position, so that other transactions do not insert data into the same position.<\/p>\n<p>To make the best out of locks within <code>InnoDB<\/code>, follow the advice above \u2013 <em>make use of bulk <code>INSERT<\/code> statements and<\/em> <em>consider delaying all commit operations until after the very last <code>INSERT<\/code> query: bulk <code>INSERT<\/code> statements will insert data in a quicker fashion and committing an operation only after the last query will be faster as well<\/em>.<\/p>\n<h3>Inserting rows from a SELECT Query<\/h3>\n<p>As noted above, <code>INSERT<\/code> queries are the primary way of inserting data into MySQL, but many DBAs will know that <code>INSERT<\/code> queries are not always simple either: certain situations may require us to use them in concert with other \u2013 typically <code>SELECT<\/code> \u2013 queries too. A query like so would also work quite successfully:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">INSERT INTO table_name (FirstColumnName, SecondColumnName)\r\nSELECT  FirstColumnName, SecondColumnName \r\nFROM another_table [options]<\/pre>\n<p>For example:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" width=\"747\" height=\"139\" class=\"wp-image-96790\" src=\"https:\/\/www.red-gate.com\/simple-talk\/wp-content\/uploads\/2023\/05\/d-misc-insert-query-optimization-screenshot_3-png.png\" alt=\"D:\\Misc\\INSERT Query Optimization\\Screenshot_3.png\" \/><\/p>\n<p><em>Image 5 \u2013 INSERT and SELECT Queries Together<\/em><\/p>\n<p>Note that in this case indexes will not be beneficial to the table where new rows are being created, but they will be beneficial fetching rows from the table in the SELECT statement. Note that in the sample query, the <code>another_table<\/code> reference in the query can be the same table. So:<\/p>\n<pre class=\"\">INSERT INTO table_name (FirstColumnName, SecondColumnName) \r\nSELECT FirstColumnName, SecondColumnName \r\nFROM table_name [options]<\/pre>\n<p>Is possible as well, in which indexes can (depending on if you have a where clause on the table,) can both be helpful and detrimental at the same time.<\/p>\n<h3>The CREATE TABLE \u2026 SELECT Query<\/h3>\n<p>To select data directly into a new table, we can also employ a <code>CREATE TABLE<\/code> statement together with a <code>SELECT<\/code> query. In other words, we can create a table and insert data into it without running a separate <code>INSERT<\/code> query afterwards. There are two ways to do this:<\/p>\n<ol>\n<li>Run a <code>CREATE TABLE<\/code> statement with a <code>SELECT<\/code> statement without defining any columns or data types: <br \/>\n<code>CREATE TABLE new_table [AS] SELECT [columns] FROM old_table;<\/code> <br \/>\nMySQL will recreate all columns, data types and data from the old_table.<\/p>\n<\/li>\n<li>You can also define the columns and constraints and insert data to the new table like so: <br \/>\n<code>CREATE TABLE new_table ( <\/code><br \/>\n<code>column1 VARCHAR(5) NOT NULL, <\/code><br \/>\n<code>column2 VARCHAR(200) NOT NULL DEFAULT \u2018None\u2019 <\/code><br \/>\n<code>) ENGINE = InnoDB <\/code><br \/>\n<code>SELECT demo_column1, demo_column2 <\/code><br \/>\n<code>FROM old_table;<\/code><\/li>\n<\/ol>\n<p>Selecting data from one table to insert directly into another table is generally faster than <code>INSERT INTO ... SELECT.<\/code><\/p>\n<h2>The Best Way to Load Massive Data Sets \u2013 INSERT vs. LOAD DATA INFILE<\/h2>\n<p><code>LOAD DATA INFILE<\/code> is built for blazing fast data insertion from text files. Speed is achieved by ignoring or eliminating overhead posed by <code>INSERT<\/code> queries done by:<\/p>\n<ul>\n<li>Working with \u201ccleaner\u201d data (data only separated by certain denominators (think \u201c,\u201d, \u201c:\u201d, \u201c|\u201d, or not separated at all.)<\/li>\n<li>Providing us with the ability to only load data into specific columns or skip loading data into certain rows or columns altogether.<\/li>\n<\/ul>\n<p>To use <code>LOAD DATA INFILE<\/code>, make sure you have a file that\u2019s separating its columns by a certain denominator (common denominators include, but are not limited to \u201c,\u201d, the <code>TAB<\/code> sign, spaces, the \u201c|\u201d, \u201c:\u201d, and \u201c-\u201c characters, etc.) and preferably one that\u2019s saved in a <code>CSV<\/code> or <code>TXT<\/code> format. Save the file in a directory, remember the path towards that directory, then use the following query (replace the path to the file with your file, the \u201c|\u201d sign with your denominator and <code>demo_table<\/code> with the name of your table) \u2013 use IGNORE to ignore all errors posed by the query (duplicate key issues, etc.):<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">LOAD DATA INFILE \u2018\/var\/lib\/mysql\/tmp\/data.csv\u2019 [IGNORE] INTO TABLE demo_table FIELDS TERMINATED BY \u2018|\u2019;<\/pre>\n<p>To export data from your database and make the data able to be re-imported by using the <code>LOAD DATA INFILE<\/code> query, use the <code>SELECT * INTO OUTFILE<\/code> query \u2013 use IGNORE if you want to ignore all errors.<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">SELECT *\r\nFROM demo_table [IGNORE] \r\nINTO OUTFILE '\/var\/lib\/mysql\/tmp\/data.csv' \r\nFIELDS TERMINATED BY '|';<\/pre>\n<p><code>LOAD DATA INFILE<\/code> has many parameters that can be used as well. These parameters include, but are not limited to:<\/p>\n<ul>\n<li>The <code>PARTITION<\/code> parameter allows us to define the partition where we want to insert data into.<\/li>\n<li>Combining the <code>IGNORE<\/code> option with the <code>LINES<\/code> or <code>ROWS<\/code> options to tell MySQL how many lines or rows to ignore when inserting data.<\/li>\n<li>Providing us with the ability to set the values of certain columns by using the <code>SET<\/code> option (the query below takes data from the file called \u201c<code>demo.csv<\/code>\u201d and loads the data with its columns terminated by \u201c:\u201d into a table called demo after ignoring the 100 lines from the beginning and also sets the date column to the current date when inserting all of the rows):<\/li>\n<\/ul>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">LOAD DATA INFILE \u2018demo.csv\u2019 INTO TABLE demo FIELDS TERMINATED BY \u2018:\u2019 IGNORE 100 LINES SET date=CURRENT_DATE();<\/pre>\n<p><code>LOAD DATA INFILE<\/code> can be made even faster if we employ <code>DEFAULT<\/code> constraints. If there are columns that need to be populated with the same data for each row, the <code>DEFAULT<\/code> constraint will be faster than fetching it from the data stream. That way, all columns having the default keyword will be pre-filled without the need to load data into them.<\/p>\n<p>Such an approach is very convenient to save time when one row consists of the same data and we don\u2019t want to employ ALTER queries (those queries make a copy of the table on the disk, insert data into it, perform operations on the newly created table, then swap the two tables.) An example is given below:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">CREATE A TABLE demo_table (\r\n`demo_prefilled_column` VARCHAR(120) NOT NULL \r\n          <strong>DEFAULT 'Value Here\u2019<\/strong>,\r\n) ENGINE = InnoDB;<\/pre>\n<p>To dive even deeper into <code>LOAD DATA INFILE<\/code>, refer to <a href=\"https:\/\/dev.mysql.com\/doc\/refman\/8.0\/en\/load-data.html\">the MySQL documentation itself<\/a>.<\/p>\n<h2>Indexes, Partitions, and INSERT FAQ<\/h2>\n<p>If even <code>LOAD DATA INFILE<\/code> doesn\u2019t help, you may want to look into indexes and partitions. Remember what we said above? The more indexes or partitions your table has, the slower your <code>INSERT<\/code> query will be. You already know the drill \u2013 but you may be pressed with some extra questions, some of them being the following:<\/p>\n<table>\n<tbody>\n<tr>\n<td>\n<p><strong>Question<\/strong><\/p>\n<\/td>\n<td>\n<p><strong>Answer<\/strong><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Should I drop all of the indexes before running an <code>INSERT<\/code> query?<\/p>\n<\/td>\n<td>\n<p><strong>Typically No<\/strong> \u2013 while indexes are used to improve the performance of finding rows, and generally slow down <code>INSERT<\/code> statements usually it is fine to let the <code>INSERT<\/code> statement process maintain indexes as part of the insert process. <\/p>\n<p>However, in some cases when loading hundreds of millions or billions of rows, indexes should certainly be removed (they can be added again by using an <code>ALTER TABLE<\/code> query.)<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Does using a <code>PRIMARY KEY<\/code> constraint on a table slow <code>INSERT<\/code> statements down?<\/p>\n<\/td>\n<td>\n<p><strong>Yes<\/strong> \u2013 primary keys are always indexed and indexes make <code>INSERT<\/code> queries slower.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Does the amount of indexes or partitions on a table make a difference in <code>INSERT<\/code> query performance?<\/p>\n<\/td>\n<td>\n<p><strong>Yes<\/strong> \u2013 the more indexes or partitions you have on a table, the slower your <code>INSERT<\/code> queries will be.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>&nbsp;<\/td>\n<td>&nbsp;<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Are <code>LOAD DATA INFILE<\/code> queries always faster than <code>INSERT<\/code> statements? Why?<\/p>\n<\/td>\n<td>\n<p><strong>Yes<\/strong> \u2013 <code>LOAD DATA INFILE<\/code> queries always come with less overhead than <code>INSERT<\/code> queries do.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>When optimizing <code>INSERT<\/code> statements, is there a difference of what storage engine is in use?<\/p>\n<\/td>\n<td>\n<p><strong>Yes <\/strong>\u2013 aim to use <code>InnoDB<\/code> unless you need specific features:<\/p>\n<ul>\n<li><strong>MyISAM<\/strong> will make <code>COUNT(*)<\/code> aggregate queries faster since the storage engine stores the row count inside of its metadata.<\/li>\n<li><strong>ARCHIVE <\/strong>will let you archive data with little footprint on the disk.<\/li>\n<li><strong>MEMORY <\/strong>is useful for temporary tables since all of its data is stored in the memory.<\/li>\n<li><strong>CSV <\/strong>will be advantageous if we need a way to migrate data into some sort of spreadsheet software.<\/li>\n<li><strong>For a complete list of storage engines, <a href=\"https:\/\/dev.mysql.com\/doc\/refman\/8.0\/en\/storage-engines.html\">check the MySQL documentation.<\/a><\/strong><\/li>\n<\/ul>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>When inserting data, is there a difference if the table we insert data into is locked or not?<\/p>\n<\/td>\n<td>\n<p><strong>Yes <\/strong>\u2013 inserting data into locked tables is generally faster than the other way around since database management systems check for table locks as part of the query execution process.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>How do I achieve a balance between <code>INSERT<\/code> and <code>SELECT<\/code> queries?<\/p>\n<\/td>\n<td>\n<p><strong>Testing<\/strong> &#8211; Load testing is important to see what effects any change will have on a system. A lot will depend on how you are using a system, and how concurrent users use a system is important. If users are simultaneously creating and querying data, it is different than if data is loaded in a time window, then queried at other times.<\/p>\n<p>Generally, aim to have only as many indexes and partitions as necessary, <code>SELECT<\/code> as few rows as possible, and where possible, use <code>LOAD DATA INFILE<\/code> instead of <code>INSERT<\/code>.\u00a0\u00a0<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Foreign keys \u2013 are they a problem for <code>INSERT<\/code> queries?<\/p>\n<\/td>\n<td>\n<p><strong>No<\/strong> \u2013 <code>INSERT<\/code> queries will be slower on a table with foreign keys since the database must check that there are no foreign key violations, but in most cases, they won\u2019t make much of a performance difference (but will make a large data integrity difference). As is the case with indexes, foreign key constraints should be dropped once we have a lot (hundreds of millions of rows) of data to insert.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>The <code>CHECK<\/code> constraint \u2013 does it slow down inserting data?<\/p>\n<\/td>\n<td>\n<p><strong>Yes<\/strong> \u2013 the <code>CHECK<\/code> constraint was introduced in MySQL 8.0.16, and it lets users specify a condition to check before inserting data into a table. Think of it as an integrity check \u2013 if a violation of the constraint is found, MySQL shows an error and terminates query execution, or skips inserting the row altogether if the <code>IGNORE<\/code> clause is specified.<\/p>\n<p>In many cases, the constraint will only be problematic if the constraint evaluates to <code>FALSE<\/code> instead of <code>TRUE<\/code> because in that case, queries will fail if no <code>IGNORE<\/code> clause is specified. Aim to avoid using constraints when inserting larger quantities of data, because they could make your <code>INSERT<\/code> queries slower. (If you remove one for performance reasons, add it back for integrity\u2019s sake.)<\/p>\n<p>More details about the <code>CHECK<\/code> constraint can be found <a href=\"https:\/\/dev.mysql.com\/blog-archive\/mysql-8-0-16-introducing-check-constraint\/\">on the MySQL blog<\/a>.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Is it possible to insert data and update it at the same time?<\/p>\n<\/td>\n<td>\n<p><strong>Yes<\/strong> \u2013 that can be done by using the <code>ON DUPLICATE KEY UPDATE<\/code> option on an<code>INSERT<\/code>statement. The statement is pretty self-explanatory: if there is a situation where a row of the same value would be inserted into a column, an update of the old row would occur. Such a statement is mostly used to increment column values whenever there are duplicate key issues, but it can have other use cases as well.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>Are there any lesser-known ways to improve <code>INSERT<\/code> query performance?<\/p>\n<\/td>\n<td>\n<p><strong>Yes<\/strong> \u2013 if you\u2019re working with bigger data sets, make use of the <code>DEFAULT<\/code> keyword. The keyword will let you specify the default value of the column and as a result, will automatically be filled in when running <code>LOAD DATA INFILE<\/code> queries saving you time in return.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<p>When to switch to <code>LOAD DATA INFILE<\/code>?<\/p>\n<\/td>\n<td>\n<p><strong>Typically, when loading any data where you have many rows to load<\/strong> \u2013 There are no set \u201crules\u201d that define when you should switch <code>INSERT<\/code> statements to <code>LOAD DATA INFILE<\/code> where possible, however, keep in mind the following:<\/p>\n<ul>\n<li>There are several necessary steps that the DBMS must go through when performing <code>INSERT<\/code> operations.<\/li>\n<li><code>INSERT<\/code> operations come with overhead. The more data is inserted, the bigger the overhead is.<\/li>\n<li><code>INSERT<\/code> query performance can be optimized a lot by using bulk <code>INSERT<\/code> queries or by delaying index updates and concurrency checking.<\/li>\n<li><code>LOAD DATA INFILE<\/code> works best when settings inside of <code>my.cnf<\/code> are optimized for <code>InnoDB<\/code>.<\/li>\n<\/ul>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Balancing INSERT Performance and Reads<\/h2>\n<p>To balance out the performance between <code>INSERT<\/code> and <code>SELECT<\/code> queries, keep the following in mind:<\/p>\n<ul>\n<li><strong>Basics do help<\/strong> \u2013 Select as few rows as possible by using a column list with your <code>SELECT<\/code> of <code>SELECT *<\/code>, and index only the columns you\u2019re searching through.<\/li>\n<li><strong>Normalize your tables<\/strong> \u2013 Table normalization helps save storage space as well as increases the speed of your read queries if used properly. <a href=\"https:\/\/www.red-gate.com\/simple-talk\/databases\/sql-server\/t-sql-programming-sql-server\/database-normalization-basics\/\">Choose a normalization method suitable for you<\/a>, then proceed further. <br \/>\nDo note that as with everything, normalization also can have negative effects \u2013 more tables and relationships can mean more individual <code>INSERT<\/code> <code>or LOAD DATA INFILE<\/code> statements, but if you do everything correctly, your database should roll just fine!<\/li>\n<li><strong>Use the proper storage engine<\/strong> \u2013 this should be obvious, but you would be surprised how many inexperienced DBAs make a mistake of using an inappropriate storage engine for their use case. If you\u2019re using MySQL, aim to use <code>InnoDB<\/code> or <code>XtraDB for general use cases<\/code>. <br \/>\nIf you\u2019re in a testing environment and memory is not an issue, the <code>MEMORY<\/code> storage engine could also be an option, but note that the storage engine cannot be as heavily optimized and that the data won\u2019t be stored on the disk either \u2013 at that point the insert queries would be blazing fast, but the data would be stored in the memory itself meaning that a shutdown of the server would destroy all of your data as well.<\/li>\n<li><strong>Use a powerful server<\/strong> \u2013 this goes hand in hand with optimizing <code>my.cnf<\/code> or other files. If your server is powerful enough, balancing the performance of <code>INSERT<\/code> and <code>SELECT<\/code> operations will be a breeze.<\/li>\n<li><strong>Only use indexes and partitions where necessary and don\u2019t overdo it<\/strong> \u2013 aim to index only the columns you\u2019re running heavy <code>SELECT<\/code> queries on, and only partition your tables if you have more than 50 million records.<\/li>\n<li><strong>Consider ditching INSERT statements altogether<\/strong> \u2013 if you must import huge chunks of data, consider splitting them into files and uploading those files with <code>LOAD DATA INFILE<\/code> instead.<\/li>\n<\/ul>\n<h2>Summary<\/h2>\n<p>In this blog, we\u2019ve walked you through how best to optimize <code>INSERT<\/code> queries for performance and answered some of the most frequently asked questions surrounding these sorts of queries. We\u2019ve also touched upon the importance of balancing <code>INSERT<\/code> and <code>SELECT<\/code> operations and walked you through a way to ditch INSERT queries altogether when loading lots of data.<\/p>\n<p>Some of the advice we\u2019ve provided in this blog is known by many DBAs, while some might not be known at all. Take from this blog what you will \u2013 there\u2019s no necessity to follow everything outlined in the article step by step, but combining some of the advice in this article with the advice contained in other parts of this series will be invaluable for your database and your applications alike.<\/p>\n<p>We hope that this blog has taught you something new, and we\u2019ll see you in the next one.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Welcome back to the MySQL optimization series! In case you haven\u2019t been following this series, in the past couple of articles we have discussed the basics of query optimization, and told you how to optimize SELECT queries for performance as well. In this blog, we\u2019re further learning ways to optimize INSERT operations and look at&#8230;&hellip;<\/p>\n","protected":false},"author":339547,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[53,145792],"tags":[],"coauthors":[146040],"class_list":["post-96784","post","type-post","status-publish","format-standard","hentry","category-featured","category-mysql"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/96784","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/339547"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=96784"}],"version-history":[{"count":5,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/96784\/revisions"}],"predecessor-version":[{"id":96812,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/96784\/revisions\/96812"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=96784"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=96784"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=96784"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=96784"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}