{"id":93337,"date":"2022-02-08T22:15:01","date_gmt":"2022-02-08T22:15:01","guid":{"rendered":"https:\/\/www.red-gate.com\/simple-talk\/?p=93337"},"modified":"2024-04-16T18:58:58","modified_gmt":"2024-04-16T18:58:58","slug":"back-up-and-restore-postgresql","status":"publish","type":"post","link":"https:\/\/www.red-gate.com\/simple-talk\/databases\/postgresql\/back-up-and-restore-postgresql\/","title":{"rendered":"How to back up and restore in PostgreSQL: Learning PostgreSQL with Grant"},"content":{"rendered":"<p><em><strong>This is part of an ongoing series of post by Grant as he takes his vast knowledge of SQL Server and applies it to adding PostgreSQL and shares it with you so you can skip learn from his triumphs and mistakes. For more you can go to the <a href=\"https:\/\/www.red-gate.com\/simple-talk\/collections\/series-learning-postgresql-with-grant\/\">Learning PostgreSQL with Grant<\/a> series home page<\/strong><\/em><\/p>\n\n<p>One of the first things I worry about when I\u2019m looking at a new system is the status of the backups. I don\u2019t see anything in PostgreSQL that will lead me to a different conclusion. However, I didn\u2019t realize just how big a can of worms I would open by pursuing backups and restores.<\/p>\n<p>Instead of talking immediately about backups, let\u2019s talk about restores.<\/p>\n<h2>Restore strategy<\/h2>\n<p>There\u2019s a saying that, as far as I know, originated with <a href=\"https:\/\/www.linkedin.com\/in\/kimberlyltripp\/\">Kimberly Tripp<\/a>: Your backups are only as good as your last restore. In short, you have to take backups, but what\u2019s important is the ability to restore the data. You can have an infinite number of backups, but if you can\u2019t restore even one of them, that\u2019s nothing but wasted space. So, before you start thinking about backups, you should be thinking about how you want to restore your systems.<\/p>\n<p>Restoring a database is an enormous topic. This article could get into all sorts of permutations about all the different methods for restoring a database. However, to keep the size of this article to something reasonable, and because I haven\u2019t learned everything I need to just yet, I\u2019m going to focus on one restore mechanism: restore from a backup. It\u2019s also possible to restore to a point-in-time, but I\u2019m saving the details of how to do that for a future article.<\/p>\n<p>A simple restore from a backup is meant to get the database back to the state it was in right at the point that a backup was taken. This kind of restore operation is acceptable for development and test systems. It\u2019s an easy way to quickly migrate a database to a new server. However, it is limited in that you can only restore to the point of the backup. If data or structural changes occurred after the backup that you\u2019d like to recover, you just can\u2019t do it.<\/p>\n<p>This is where the point in time restore comes into play. Generally, most organizations frown on data loss. For example, taking a backup once a day means that, if a problem occurs about 12 hours into the day, you\u2019re going to lose all that data. A point-in-time recovery means you can get the last backup and all the changes to the database since that backup. Point-in-time recoveries are accomplished through the write ahead log (WAL) maintained by PostgreSQL.<\/p>\n<p>The WAL is automatically created and maintained by your servers. You can configure the size and number of files in the WAL. In theory, an infinite number of WAL files can be played back to arrive at a point in time. However, that brings up another aspect of a restore strategy, recovery time.<\/p>\n<p>A recovery strategy must be built to define how you would like to restore and how long you want it to take. Development systems usually require taking a quick database backup and you\u2019re done. Production systems need point-in-time restores, so you probably want a database backup that requires managing the WAL files.<\/p>\n<p>With the requirements defined, you can then walk over and pick up the right backup tool.<\/p>\n<h2>Database back up<\/h2>\n<p>If simple database restores are all you\u2019re interested in, then the best tool for the job is <code>pg_dump<\/code>. The core syntax is simple:<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">pg_dump\u00a0databasename\u00a0&gt;\u00a0backupfile<\/pre>\n<p>However, because I\u2019m running my databases, either in containers or out in the cloud, this simple syntax won\u2019t work because the service is running under a different login than what I\u2019m using to connect remotely. Instead, I can modify the command slightly:<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">pg_dump\u00a0-U\u00a0postgres\u00a0postgrelearning\u00a0&gt;\u00a0butest1<\/pre>\n<p>This command will work because I\u2019m supplying the user that runs the service.<\/p>\n<p>When running PostgreSQL in Docker, you must open a shell command to the container like this where PostgreHome is the name of the container:<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">docker\u00a0container\u00a0exec\u00a0-it\u00a0PostgreHome\u00a0\/bin\/bash<\/pre>\n<p>Then running the original syntax just works, and the backup file will be found in the directory from where you are running the command unless you supply a path.<\/p>\n<p>When running the command supplying the user from the external command line pointing it to the container, it\u2019s going to demand a password. Once supplied, though, again, it just works.<\/p>\n<p>You can exercise more control by setting various options in <code>pg_dump<\/code>, and I\u2019ll discuss several of these a little later. However, one option to immediately consider, especially when thinking about how you intend to restore the database, is to change the output. By default, pg_dump goes to a plain SQL file. <code>Pg_dump<\/code> can also go into a compressed, digital file format by adding <code>-Fc<\/code> which is the custom format:<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">Pg_dump -U postgres -Fc postgrelearning &gt; bucustom.dmp<\/pre>\n<p>Using either of these two methods, you can back up your database. Now let\u2019s talk about running a restore.<\/p>\n<h2>Database recovery<\/h2>\n<p>The output from the default <code>pg_dump<\/code> command is interesting. However, it gets very long, very quickly. I have a very simple database created from other scripts in this series so far. Here\u2019s a small part of the output from <code>pg_dump<\/code> with the default format that you can see by using <code>CAT<\/code>:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk \">--\r\n-- PostgreSQL database dump\r\n--\r\n-- Dumped from database version 14.1 (Debian 14.1-1.pgdg110+1)\r\n-- Dumped by pg_dump version 14.1\r\nSET statement_timeout = 0;\r\nSET lock_timeout = 0;\r\nSET idle_in_transaction_session_timeout = 0;\r\nSET client_encoding = 'UTF8';\r\nSET standard_conforming_strings = on;\r\nSELECT pg_catalog.set_config('search_path', '', false);\r\nSET check_function_bodies = false;\r\nSET xmloption = content;\r\nSET client_min_messages = warning;\r\nSET row_security = off;\r\n--\r\n-- Name: hsr; Type: SCHEMA; Schema: -; Owner: postgres\r\n--\r\nCREATE SCHEMA hsr;\r\nALTER SCHEMA hsr OWNER TO postgres;\r\nSET default_tablespace = '';\r\nSET default_table_access_method = heap;\r\n--\r\n-- Name: radiobrand; Type: TABLE; Schema: hsr; Owner: postgres\r\n--\r\nCREATE TABLE hsr.radiobrand (\r\n    radiobrandid integer NOT NULL,\r\n    radiobranddesc character varying(50) NOT NULL\r\n);<\/pre>\n<p>It keeps going from there. In short, <code>pg_dump<\/code> is scripting out the database, structures, and data, into a file. It\u2019s an SQL file that you can then run from psql or any other tool. You\u2019ll get the objects and data created using the SQL from the pg_dump process. Since it does include data, this file could be extremely large, depending on the database.<\/p>\n<p>Recovery then is just a question of replaying the scripts, in other words, just executing the SQL. You could open it in the GUI of your choice, or you could take advantage of the command line utility like this:<\/p>\n<pre class=\"lang:tsql theme:ssms2012-simple-talk\">Psql -U postgres postgrelearning &lt; butest1<\/pre>\n<p>Of course, you would have to ensure that you have an empty database, otherwise, this will attempt to rebuild and reinsert data that\u2019s already there. You can drop the existing database, create a new one and then run the scripts, or you can drop all the objects from the database, and then run the scripts.<\/p>\n<p>If you choose to get the custom file format, you won\u2019t have a simple set of SQL scripts, so replaying them requires a new command, <a href=\"https:\/\/www.postgresql.org\/docs\/9.2\/app-pgrestore.html\">pg_restore<\/a>.<\/p>\n<p>The base command to use <code>pg_restore<\/code> is just as simple as pg_dump:<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">pg_restore -d postgrelearning -U postgres bucustom.dmp<\/pre>\n<p>The same rules apply. The database must be either a new database or one you\u2019ve cleaned unless you want to take advantage of the additional functionality I talked about when using the custom file format. For example, you could restore to a database that still has all its objects inside. You can choose to drop the existing objects first by modifying the command as follows:<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">pg_restore -d postgrelearning -U postgres -c bucustom.dmp<\/pre>\n<p>The command <code>-c<\/code>, or <code>--clean<\/code>, will drop objects first, however, you could run into error messages if you add a new object after the backup was taken. You can also tell <code>pg_restore<\/code> to create the database first:<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">pg_restore -d postgres -U postgres -C bucustom.dmp<\/pre>\n<p>The <code>-C<\/code>, or <code>--create<\/code>, command will try to create a postgrelearning database. If one exists, you\u2019ll get an error, so, you can again modify the <code>pg_restore<\/code> command:<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">pg_restore -d postgres -U postgres -C -c bucustom.dmp<\/pre>\n<p>Running both <code>--clean<\/code> and <code>--create<\/code> will result in <code>pg_restore<\/code> dropping and recreating your database. Note that you will initially connect to a different database with the <code>-d<\/code> option and that the restore will use the database name from the dump file.<\/p>\n<p>While running the SQL scripts of a basic <code>pg_dump<\/code> command is simple, you don\u2019t have much control. You will find there are quite a few options if you use the custom file format and <code>pg_restore<\/code>. Using <code>pg_restore<\/code> you can:<\/p>\n<ul>\n<li>Restore just the data and not the schema<\/li>\n<li>Restore only the schema and no data<\/li>\n<li>Output the contents of the dump file as a list of objects<\/li>\n<li>Define a specific list of objects to restore<\/li>\n<li>Disable triggers<\/li>\n<li>Use multiple concurrent jobs to make the restore run faster<\/li>\n<li>Pick and choose individual objects to restore<\/li>\n<\/ul>\n<p>There are a number of other behaviors you can control in <code>pg_restore<\/code> that don\u2019t apply to a getting started article. However, to summarize, you get a lot more control over exactly what gets restored using <code>pg_restore<\/code> and the custom dump format instead of the standard SQL.<\/p>\n<h2>pg_dump options<\/h2>\n<p>I already introduced one option available to <code>pg_dump<\/code>, the custom file format. Several additional options enable you to control what, and how, things get backed up.<\/p>\n<p>The options are defined in the command line as has been previously shown. You can use abbreviations like <code>-a<\/code>, which means only dump the data for a given database, or you can use the full option, <code>--data_only<\/code>. Some options are complete, and some require you to supply additional information.<\/p>\n<p>For example, you can specify that the dump is ready to clean the database as part of the restore:<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">Pg_dump -c postgrelearning &gt; pgl.sql<\/pre>\n<p>Now when you run the scripts, it will include <code>DROP<\/code> commands.<\/p>\n<p>You can use include or exclude patterns to only move some of the objects in a database. For example, to exclude schemas that start with the string <code>dev<\/code>\u00a0in them, you could run this:<\/p>\n<pre class=\"lang:ps theme:powershell-ise\">Pg_dump -N dev* postgrelearning &gt; nodev.sql<\/pre>\n<p>You also have commands to make it possible to:<\/p>\n<ul>\n<li>Backup just the schema without data<\/li>\n<li>Backup only the data, without a schema definition included<\/li>\n<li>Define multiple concurrent jobs, so the backup occurs faster<\/li>\n<li>Add in conditional commands, <code>IF EXISTS<\/code>, to clean up<\/li>\n<\/ul>\n<p>And a whole lot more.<\/p>\n<h2>How to back up all the databases<\/h2>\n<p>It\u2019s fairly likely that you have more than one database. You can script out <code>pg_dump<\/code> so that you run it over and over, specifying which database to backup and where it should go. If you have a very customized backup process where each database is unique, this is probably the way to go. However, you also have the option to use <code>pg_dumpall<\/code>. Instead of one database at a time, this will export all the databases on a given server to a single file. The <code>pg_dumpall<\/code> command is basically just a wrapper around <code>pg_dump<\/code>. However, since all databases are being dumped, <code>pg_dumpall<\/code> will also gather dependent objects for the databases that may be in a system table.<\/p>\n<p>The problem with <code>pg_dumpall<\/code> is that it only outputs to the standard SQL format. You can\u2019t get the compressed custom format. This also means you don\u2019t get all the functionality of <code>pg_restore<\/code>. As before, this means that you can only restore to empty databases.<\/p>\n<h2>Other back up mechanisms<\/h2>\n<p>Like almost any major RDBMS, there are third-party solutions out there for PostgreSQL. Also, if you\u2019re running AWS RDS, AWS Aurora, Azure Database for PostgreSQL, or GCP SQL Database for PostgreSQL, they all come with built-in backup and recovery mechanisms. You can also backup and restore from PGAdmin4 and restore from Azure Data Studio. Plus, there are multiple scripted solutions available through the <a href=\"https:\/\/www.bing.com\/search?q=postgresql+community+backup+script&amp;qs=n&amp;form=QBRE&amp;msbsrank=5_5__0&amp;sp=-1&amp;pq=postgresql+community+backup+s&amp;sc=5-29&amp;sk=&amp;cvid=AF354E65BC944BE69EFDA739C3959B70\">PostgreSQL community<\/a>. I would strongly recommend learning how to use the native tools to backup and restore your databases. However, once you\u2019ve built the skill set, then moving to a different solution would be fine.<\/p>\n<h2>Conclusion<\/h2>\n<p>The ability to restore a PostgreSQL database is critical. Getting a quick backup is quite easy with the <code>pg_dump<\/code> command, and it\u2019s most suited for development. This command creates a script file by default that you can play. There are several other options, including <code>-Fc<\/code> to create a backup with a custom format that can be used with the <code>pg_restore<\/code> command. The <code>pg_dump<\/code> command provides several options, including the ability to drop and create the database. If you need to back up all the databases, use the <code>pg_dumpall<\/code> command, however, it\u2019s not as flexible.<\/p>\n<p>Production databases usually need to be able to recover to a point-in-time, encrypt the backup, and there are often other requirements, so other options like WAL or a third-party tool must be used. I\u2019ve purposely saved the point-in-time restore discussion for another article because it is quite complex compared to <code>pg_dump<\/code>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>You can perform PostgreSQL backups with the pg_dump command. In this article, Grant Fritchey explains how to back up and restore with the pg_dump and pg_restore utilities.&hellip;<\/p>\n","protected":false},"author":221792,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[53,143534],"tags":[95506,158977,159066],"coauthors":[6785],"class_list":["post-93337","post","type-post","status-publish","format-standard","hentry","category-featured","category-postgresql","tag-automate","tag-learningpostgresqlwithgrant","tag-postgresql-101-webinar-sidebar"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/93337","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/users\/221792"}],"replies":[{"embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/comments?post=93337"}],"version-history":[{"count":7,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/93337\/revisions"}],"predecessor-version":[{"id":94906,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/posts\/93337\/revisions\/94906"}],"wp:attachment":[{"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/media?parent=93337"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/categories?post=93337"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/tags?post=93337"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.red-gate.com\/simple-talk\/wp-json\/wp\/v2\/coauthors?post=93337"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}